Re: Online machine learning on top of Hama BSP

Sebastian Schelter Sat, 26 May 2012 05:05:33 -0700

Hi Thomas,

I think that none of us wants to start a flame war here.


As a disclaimer I have to remark that I'm biased towards Giraph as well
because besides my engagement at Mahout, I'm committer and PMC member of
Giraph.

Regarding commit statistics: a single commit can correct a comment or
rewrite a whole layer of an application, so looking at the raw number of
commits is useless.

In my personal opinion, Mahout will have to move away from
Hadoop/MapReduce for a lot of problems. The question which alternative
execution model to integrate is a hard one, as well as deciding when
this should happen. The answer to that question will determine the
future of Mahout, and a discussion about this should be unagitated.

I think the real question is whether BSP itself is the optimal execution
model (regardless of the flavor of implementation) or whether Mahout
should better wait for a viable implementation of an asynchronous
execution model similar to what is implemented in GraphLab.

--sebastian

On 26.05.2012 11:26, Thomas Jungblut wrote:
> Hi Ted,
> 
> please keep this factual, we are not here to start a flame war.
> But to correct you, if you take a closter look at the mailing list
> statistics [1]:
> hama-commits: 1.51 mails per day (AVG)
> Opposed to giraph:
> giraph-commits: 0.68 mails per day (AVG)
> So we have a more faster development than giraph.
> Also we work on top of HDFS, so you can combine mapreduce jobs with BSP
> jobs easily.
> We are just not running inside of MapReduce, these things will neglect
> anyways when YARN has a stable release.
> Currently Hama can operate on YARN with it's on ApplicationMaster whereas
> Giraph still needs to be on top of MapReduce.
> 
> Now to you Sebastian,
> 
>> Interesting discussion, which examples do you have in mind that might be
>> easier representable in general BSP than in Giraph/Pregel?
> 
> 
> straight forward translations from MPI for example. Someone of us is
> currently working on a SVM implementation in BSP, which originally was
> based on MPI.[2]
> We would love to have this contributed to mahout, but if Ted is not
> interested in Hama we will put this in our modules.
> Also there are graph problems that need major supervision like Top-K
> Shortest Paths, which cannot be easily expressed with aggregators.
> 
> We have benchmarks showing the scalability and maturity of Hama [3] and
> would be glad to roll out to several other Apache projects.
> BTW it would be cool if we could compare the performance of your k-means in
> MapReduce with that of our BSP version, you see the benchmark in [3] as
> well.
> 
> Actually that was not why were are here, we wanted to hear some general
> interest in real-time recommendation with Hama since all the ML guys are
> here. Even if Ted is a fanboy of giraph ;)
> 
> Regards from Berlin,
> Thomas
> 
> [1] http://pulse.apache.org/#incubator.apache.org
> [2] http://code.google.com/p/psvm/
> [3] http://wiki.apache.org/hama/Benchmarks
> 
> 
> 2012/5/26 Ted Dunning <[email protected]>
> 
>> On Fri, May 25, 2012 at 11:41 PM, Edward J. Yoon <[email protected]
>>> wrote:
>>
>>>> Compared with Hama, what's the advantage of giraph? probably
>>>
>>> probably mature implementation? :D
>>>
>>
>> Yes.  And very active community.  And recent history of rapid development.
>>  And easy compatibility with map-reduce programs.
>>
> 
> 
>

Re: Online machine learning on top of Hama BSP

Reply via email to