This is the latest GraphX-based ALS implementation that I'm aware of: https://github.com/ankurdave/spark/blob/GraphXALS/graphx/src/main/scala/org/apache/spark/graphx/lib/ALS.scala
When I benchmarked it last year, it was about twice as slow as MLlib's ALS, and I think the latter has gotten faster since then. The performance gap is because the MLlib version implements some ALS-specific optimizations that are hard to do using GraphX, such as storing the edges twice (partitioned by source and by destination) to reduce communication. Ankur <http://www.ankurdave.com/> On Tue, May 26, 2015 at 3:36 PM, Ben Mabey <b...@benmabey.com> wrote: > I've heard in a number of presentations Spark's ALS implementation was > going to be moved over to a GraphX version. For example, this > presentation on GraphX > <https://databricks-training.s3.amazonaws.com/slides/graphx@sparksummit_2014-07.pdf>(slide > #23) at the Spark Summit mentioned a 40 LOC version using the Pregel API. > Looking at the ALS source on master > <https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala> > it looks like the original implementation is still being used and no use of > GraphX can be seen. Other algorithms mentioned in the GraphX presentation > can be found in the repo > <https://github.com/apache/spark/tree/master/graphx/src/main/scala/org/apache/spark/graphx/lib> > already but I don't see ALS. Could someone link me to the GraphX version > for comparison purposes? Also, could someone comment on why the the newer > version isn't in use yet (i.e. are there tradeoffs with using the GraphX > version that makes it less desirable)? >