Crazy awesome.

> On Jul 5, 2014, at 4:19 PM, Pat Ferrel <[email protected]> wrote:
> 
> I compared  spark-itemsimilatity to the Hadoop version on sample data that is 
> 8.7 M, 49290 x 139738 using my little 2 machine cluster and got the following 
> speedup. 
> 
> Platform            Elapsed Time
> Mahout Hadoop    0:20:37
> Mahout Spark        0:02:19
> 
> This isn’t quite apples to apples because the Spark version does all the 
> dictionary management, which is usually two extra jobs tacked on before and 
> after the Hadoop job. I’ve done the complete pipeline using Hadoop and Spark 
> now and can say that not only is it faster now but the old Hadoop way 
> required keeping track of 10x more intermediate data and connecting up many 
> more jobs to get the pipeline working. Now it’s just one job. You don’t need 
> to worry about ID translation anymore and you get over 10x faster completion 
> — this is one of those times when speed meets ease-of-use.

Reply via email to