Nikaash, yes unfortunately you may need to play with parallelism for your particular load/cluster manually to get the best out of it. I guess Pat will be adding the option.
On Fri, Apr 29, 2016 at 11:14 AM, Nikaash Puri <[email protected]> wrote: > Hi, > > Sure, I’ll do some more detailed analysis of the jobs on the UI and share > screenshots if possible. > > Pat, yup, I’ll only be able to get to this on Monday, though. I’ll comment > out the line and see the difference in performance. > > Thanks so much for helping guys, I really appreciate it. > > Also, the algorithm implementation for LLR is extremely performant, at > least as of Mahout 0.10. I ran some tests for around 61 days of data (which > in our case is a fair amount) and the model was built in about 20 minutes, > which is pretty amazing. This was using a pretty decent sized cluster, > though. > > Thank you, > Nikaash Puri > > On 29-Apr-2016, at 10:18 PM, Pat Ferrel <[email protected]> wrote: > > There are some other changes I want to make for the next rev so I’ll do > that. > > Nikaash, it would still be nice to verify this fixes your problem, also if > you want to create a Jira it will guarantee I don’t forget. > > > On Apr 29, 2016, at 9:23 AM, Dmitriy Lyubimov <[email protected]> wrote: > > yes -- i would do it as an optional option -- just like par does -- do > nothing; try auto, or try exact number of splits > > On Fri, Apr 29, 2016 at 9:15 AM, Pat Ferrel <[email protected]> wrote: > >> It’s certainly easy to put this in the driver, taking it out of the algo. >> >> Dmitriy, is it a candidate for an Option param to the algo? That would >> catch cases where people rely on it now (like my old DStream example) but >> easily allow it to be overridden to None to imitate pre 0.11, or passed in >> when the app knows better. >> >> Nikaash, are you in a position to comment out the .par(auto=true) and see >> if it makes a difference? >> >> >> On Apr 29, 2016, at 8:53 AM, Dmitriy Lyubimov <[email protected]> wrote: >> >> can you please look into spark UI and write down how many split the job >> generates in the first stage of the pipeline, or anywhere else there's >> signficant variation in # of splits in both cases? >> >> the row similarity is a very short pipeline (in comparison with what would >> normally be on average). so only the first input re-splitting is critical. >> >> The splitting along the products is adjusted by optimizer automatically to >> match the amount of data segments observed on average in the input(s). >> e.g. >> if uyou compute val C = A %*% B and A has 500 elements per split and B has >> 5000 elements per split then C would approximately have 5000 elements per >> split (the larger average in binary operator cases). That's approximately >> how it works. >> >> However, the par() that has been added, is messing with initial >> parallelism >> which would naturally affect the rest of pipeline per above. I now doubt >> it >> was a good thing -- when i suggested Pat to try this, i did not mean to >> put >> it _inside_ the algorithm itself, rather, into the accurate input >> preparation code in his particular case. However, I don't think it will >> work in any given case. Actually sweet spot parallelism for >> multioplication >> unfortunately depends on tons of factors -- network bandwidth and hardware >> configuration, so it is difficult to give it a good guess universally. >> More >> likely, for cli-based prepackaged algorithms (I don't use CLI but rather >> assemble pipelines in scala via scripting and scala application code) the >> initial paralellization adjustment options should probably be provided to >> CLI. >> >> > > >
