The GraphLab guys benchmark their ALS implementation against an old
version of ours and in detail describe why they can achieve a 40x to 60x
performance improvement. Most of the overhead is attributed to Hadoop
and its programming model.

Its on the left column of Page 724 in
http://vldb.org/pvldb/vol5/p716_yuchenglow_vldb2012.pdf

On 11.03.2013 21:35, Dmitriy Lyubimov wrote:
> Exactly! I have always thought that was the main reason why ALS in Giraph
> was faster.
> 
> Doesn't it make strong case for a hybrid environment? Anyway, what i am
> saying, isn't it more or less truthful to say that in pragmatic ways ALS
> stuff in Mahout is lagging for the very reason of Mahout being constrained
> to MR?
> 
> 
> 
> On Mon, Mar 11, 2013 at 1:16 PM, Ted Dunning <[email protected]> wrote:
> 
>> Kinda sorta..
>>
>> You can defeat most of the sort if you want to just hash things to buckets.
>>
>> On Mon, Mar 11, 2013 at 12:01 PM, Dmitriy Lyubimov <[email protected]
>>> wrote:
>>
>>> Sort component adds log to
>>> the asymptotic complexity, whereas it is clear that any streaming merge
>>> algorithm just wouldn't need to do sort and capitalize on the structure
>> we
>>> already know . (sure, you can do it map-side with a specific streaming
>> join
>>> logic but that would not be pure MR but rather some map task acrobatics).
>>>
>>
> 

Reply via email to