This is the latest GraphX-based ALS implementation that I'm aware of:
https://github.com/ankurdave/spark/blob/GraphXALS/graphx/src/main/scala/org/apache/spark/graphx/lib/ALS.scala

When I benchmarked it last year, it was about twice as slow as MLlib's ALS,
and I think the latter has gotten faster since then. The performance gap is
because the MLlib version implements some ALS-specific optimizations that
are hard to do using GraphX, such as storing the edges twice (partitioned
by source and by destination) to reduce communication.

Ankur <http://www.ankurdave.com/>

On Tue, May 26, 2015 at 3:36 PM, Ben Mabey <b...@benmabey.com> wrote:

> I've heard in a number of presentations Spark's ALS implementation was
> going to be moved over to a GraphX version. For example, this
> presentation on GraphX
> <https://databricks-training.s3.amazonaws.com/slides/graphx@sparksummit_2014-07.pdf>(slide
> #23) at the Spark Summit mentioned a 40 LOC version using the Pregel API.
> Looking at the ALS source on master
> <https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala>
> it looks like the original implementation is still being used and no use of
> GraphX can be seen. Other algorithms mentioned in the GraphX presentation
> can be found in the repo
> <https://github.com/apache/spark/tree/master/graphx/src/main/scala/org/apache/spark/graphx/lib>
> already but I don't see ALS. Could someone link me to the GraphX version
> for comparison purposes?  Also, could someone comment on why the the newer
> version isn't in use yet (i.e. are there tradeoffs with using the GraphX
> version that makes it less desirable)?
>

Reply via email to