Crazy awesome.
> On Jul 5, 2014, at 4:19 PM, Pat Ferrel <[email protected]> wrote: > > I compared spark-itemsimilatity to the Hadoop version on sample data that is > 8.7 M, 49290 x 139738 using my little 2 machine cluster and got the following > speedup. > > Platform Elapsed Time > Mahout Hadoop 0:20:37 > Mahout Spark 0:02:19 > > This isn’t quite apples to apples because the Spark version does all the > dictionary management, which is usually two extra jobs tacked on before and > after the Hadoop job. I’ve done the complete pipeline using Hadoop and Spark > now and can say that not only is it faster now but the old Hadoop way > required keeping track of 10x more intermediate data and connecting up many > more jobs to get the pipeline working. Now it’s just one job. You don’t need > to worry about ID translation anymore and you get over 10x faster completion > — this is one of those times when speed meets ease-of-use.
