The GraphLab guys benchmark their ALS implementation against an old version of ours and in detail describe why they can achieve a 40x to 60x performance improvement. Most of the overhead is attributed to Hadoop and its programming model.
Its on the left column of Page 724 in http://vldb.org/pvldb/vol5/p716_yuchenglow_vldb2012.pdf On 11.03.2013 21:35, Dmitriy Lyubimov wrote: > Exactly! I have always thought that was the main reason why ALS in Giraph > was faster. > > Doesn't it make strong case for a hybrid environment? Anyway, what i am > saying, isn't it more or less truthful to say that in pragmatic ways ALS > stuff in Mahout is lagging for the very reason of Mahout being constrained > to MR? > > > > On Mon, Mar 11, 2013 at 1:16 PM, Ted Dunning <[email protected]> wrote: > >> Kinda sorta.. >> >> You can defeat most of the sort if you want to just hash things to buckets. >> >> On Mon, Mar 11, 2013 at 12:01 PM, Dmitriy Lyubimov <[email protected] >>> wrote: >> >>> Sort component adds log to >>> the asymptotic complexity, whereas it is clear that any streaming merge >>> algorithm just wouldn't need to do sort and capitalize on the structure >> we >>> already know . (sure, you can do it map-side with a specific streaming >> join >>> logic but that would not be pure MR but rather some map task acrobatics). >>> >> >
