There are some other changes I want to make for the next rev so I’ll do that.

Nikaash, it would still be nice to verify this fixes your problem, also if you 
want to create a Jira it will guarantee I don’t forget.


On Apr 29, 2016, at 9:23 AM, Dmitriy Lyubimov <[email protected]> wrote:

yes -- i would do it as an optional option -- just like par does -- do nothing; 
try auto, or try exact number of splits

On Fri, Apr 29, 2016 at 9:15 AM, Pat Ferrel <[email protected] 
<mailto:[email protected]>> wrote:
It’s certainly easy to put this in the driver, taking it out of the algo.

Dmitriy, is it a candidate for an Option param to the algo? That would catch 
cases where people rely on it now (like my old DStream example) but easily 
allow it to be overridden to None to imitate pre 0.11, or passed in when the 
app knows better.

Nikaash, are you in a position to comment out the .par(auto=true) and see if it 
makes a difference?


On Apr 29, 2016, at 8:53 AM, Dmitriy Lyubimov <[email protected] 
<mailto:[email protected]>> wrote:

can you please look into spark UI and write down how many split the job
generates in the first stage of the pipeline, or anywhere else there's
signficant variation in # of splits in both cases?

the row similarity is a very short pipeline (in comparison with what would
normally be on average). so only the first input re-splitting is critical.

The splitting along the products is adjusted by optimizer automatically to
match the amount of data segments observed on average in the input(s). e.g.
if uyou compute val C = A %*% B and A has 500 elements per split and B has
5000 elements per split then C would approximately have 5000 elements per
split (the larger average in binary operator cases).  That's approximately
how it works.

However, the par() that has been added, is messing with initial parallelism
which would naturally affect the rest of pipeline per above. I now doubt it
was a good thing -- when i suggested Pat to try this, i did not mean to put
it _inside_ the algorithm itself, rather, into the accurate input
preparation code in his particular case. However, I don't think it will
work in any given case. Actually sweet spot parallelism for multioplication
unfortunately depends on tons of factors -- network bandwidth and hardware
configuration, so it is difficult to give it a good guess universally. More
likely, for cli-based prepackaged algorithms (I don't use CLI but rather
assemble pipelines in scala via scripting and scala application code) the
initial paralellization adjustment options should probably be provided to
CLI.



Reply via email to