Hi, I’ve been working with LLR in Mahout for a while now. Mostly using the SimilarityAnalysis.cooccurenceIDss function. I recently upgraded the Mahout libraries to 0.11, and subsequently also tried with 0.12 and the same program is running orders of magnitude slower (at least 3x based on initial analysis).
Looking into the tasks more carefully, comparing 0.10 and 0.11 shows that the amount of Shuffle being done in 0.11 is significantly higher, especially in the AtB step. This could possibly be a reason for the reduction in performance. Although, I am working on Spark 1.2.0. So, its possible that this could be causing the problem. It works fine with Mahout 0.10. Any ideas why this might be happening? Thank you, Nikaash Puri
