Hi,

Ok, so interestingly enough when I repartition my input data across indicators 
on the User IDs, I get significant speedup. This is probably because shuffle 
goes down since RDDs with the same user ids are more likely located on the same 
nodes. What’s even more interesting is the behaviour as a function of the 
number of partitions. 

Concretely, in my case I was using around 20 cores. So, setting the number of 
partitions as 200 or more leads to greater shuffle and poorer performance. 
Setting the number of partitions to slightly more than the number of cores, 30 
in my case gives significant speedups in the AtB calculations. Again, my guess 
is that shuffle is the reason.

I’ll keep experimenting and share more results.

All of these tests are with Spark 1.2.0 and Mahout 0.10. 

Thank you,
Nikaash Puri
> On 28-Apr-2016, at 2:50 AM, Pat Ferrel <[email protected]> wrote:
> 
> I have been using the same function through all those versions of Mahout. I’m 
> running on newer versions of Spark 1.4-1.6.2. Using my datasets there has 
> been no slowdown. I assume that you are only changing the Mahout 
> version—leaving data, Spark, HDFS, and all config the same. In which case I 
> wonder if you are somehow running into limits of your machine like memory? 
> Have you allocated a fixed executor memory limit?
> 
> There has been almost no code change to item similarity. Dmitriy, do you know 
> if the underlying AtB has changed? I seem to recall the partitioning was set 
> to “auto” about 0.11. We were having problems with large numbers of small 
> part files from Spark Streaming causing partitioning headaches as I recall. 
> In some unexpected way the input structure was trickling down into 
> partitioning decisions made in Spark. 
> 
> The first thing I’d try is giving the job more executor memory, the second is 
> to upgrade Spark. A 3x increase in execution speed is a pretty big deal if it 
> isn’t helped with these easy fixes so can you share your data? 
> 
> On Apr 27, 2016, at 8:37 AM, Dmitriy Lyubimov <[email protected]> wrote:
> 
> 0.11 targets 1.3+.
> 
> I don't quite have anything on top of my head affecting A'B specifically,
> but i think there were some chanages affecting in-memory multiplication
> (which is of course used in distributed A'B).
> 
> I am not in particular familiar or remember details of row similarity on
> top of my head, i really wish the original contributor would comment on
> that. trying to see if i can come up with anything useful though.
> 
> what behavior do you see in this job -- cpu-bound or i/o bound?
> 
> there are a few pointers to look at:
> 
> (1)  I/O many times exceeds the input size, so spills are inevitable. So
> tuning memory sizes and look at spark spill locations to make sure disks
> are not slow there is critical. Also, i think in spark 1.6 spark added a
> lot of flexibility in managing task/cache/shuffle memory sizes, it may help
> in some unexpected way.
> 
> (2) sufficient cache: many pipelines commit reused matrices into cache
> (MEMORY_ONLY) which is the default mahout algebra behavior, assuming there
> is enough cache memory there for only good things to happen. if it is not,
> however, it will cause recomputation of results that were evicted. (not
> saying it is a known case for row similarity in particular). make sure this
> is not the case. For cases of scatter type exchanges it is especially super
> bad.
> 
> (3) A'B -- try to hack and play with implemetnation there in AtB (spark
> side) class. See if you can come up with a better arrangement.
> 
> (4) in-memory computations (MMul class) if that's the bottleneck can be in
> practice quick-hacked with mutlithreaded multiplication and bridge to
> native solvers (netlib-java) at least for dense cases. this is found to
> improve performance of distributed multiplications a bit. Works best if you
> get 2 threads in the backend and all threads in the front end.
> 
> There are other known things that can improve speed multiplication of the
> public mahout version, i hope mahout will improve on those in the future.
> 
> -d
> 
> On Wed, Apr 27, 2016 at 6:14 AM, Nikaash Puri <[email protected]> wrote:
> 
>> Hi,
>> 
>> I’ve been working with LLR in Mahout for a while now. Mostly using the
>> SimilarityAnalysis.cooccurenceIDss function. I recently upgraded the Mahout
>> libraries to 0.11, and subsequently also tried with 0.12 and the same
>> program is running orders of magnitude slower (at least 3x based on initial
>> analysis).
>> 
>> Looking into the tasks more carefully, comparing 0.10 and 0.11 shows that
>> the amount of Shuffle being done in 0.11 is significantly higher,
>> especially in the AtB step. This could possibly be a reason for the
>> reduction in performance.
>> 
>> Although, I am working on Spark 1.2.0. So, its possible that this could be
>> causing the problem. It works fine with Mahout 0.10.
>> 
>> Any ideas why this might be happening?
>> 
>> Thank you,
>> Nikaash Puri
> 

Reply via email to