Hi Thomas, I think that none of us wants to start a flame war here.
As a disclaimer I have to remark that I'm biased towards Giraph as well because besides my engagement at Mahout, I'm committer and PMC member of Giraph. Regarding commit statistics: a single commit can correct a comment or rewrite a whole layer of an application, so looking at the raw number of commits is useless. In my personal opinion, Mahout will have to move away from Hadoop/MapReduce for a lot of problems. The question which alternative execution model to integrate is a hard one, as well as deciding when this should happen. The answer to that question will determine the future of Mahout, and a discussion about this should be unagitated. I think the real question is whether BSP itself is the optimal execution model (regardless of the flavor of implementation) or whether Mahout should better wait for a viable implementation of an asynchronous execution model similar to what is implemented in GraphLab. --sebastian On 26.05.2012 11:26, Thomas Jungblut wrote: > Hi Ted, > > please keep this factual, we are not here to start a flame war. > But to correct you, if you take a closter look at the mailing list > statistics [1]: > hama-commits: 1.51 mails per day (AVG) > Opposed to giraph: > giraph-commits: 0.68 mails per day (AVG) > So we have a more faster development than giraph. > Also we work on top of HDFS, so you can combine mapreduce jobs with BSP > jobs easily. > We are just not running inside of MapReduce, these things will neglect > anyways when YARN has a stable release. > Currently Hama can operate on YARN with it's on ApplicationMaster whereas > Giraph still needs to be on top of MapReduce. > > Now to you Sebastian, > >> Interesting discussion, which examples do you have in mind that might be >> easier representable in general BSP than in Giraph/Pregel? > > > straight forward translations from MPI for example. Someone of us is > currently working on a SVM implementation in BSP, which originally was > based on MPI.[2] > We would love to have this contributed to mahout, but if Ted is not > interested in Hama we will put this in our modules. > Also there are graph problems that need major supervision like Top-K > Shortest Paths, which cannot be easily expressed with aggregators. > > We have benchmarks showing the scalability and maturity of Hama [3] and > would be glad to roll out to several other Apache projects. > BTW it would be cool if we could compare the performance of your k-means in > MapReduce with that of our BSP version, you see the benchmark in [3] as > well. > > Actually that was not why were are here, we wanted to hear some general > interest in real-time recommendation with Hama since all the ML guys are > here. Even if Ted is a fanboy of giraph ;) > > Regards from Berlin, > Thomas > > [1] http://pulse.apache.org/#incubator.apache.org > [2] http://code.google.com/p/psvm/ > [3] http://wiki.apache.org/hama/Benchmarks > > > 2012/5/26 Ted Dunning <[email protected]> > >> On Fri, May 25, 2012 at 11:41 PM, Edward J. Yoon <[email protected] >>> wrote: >> >>>> Compared with Hama, what's the advantage of giraph? probably >>> >>> probably mature implementation? :D >>> >> >> Yes. And very active community. And recent history of rapid development. >> And easy compatibility with map-reduce programs. >> > > >
