Hi Thomas, I am, by no means, a Giraph experts (not yet ;-)). However I want to share my opinions.
Thomas Quintana wrote: > Keep in mind that I'm aware of the obvious: > - Hadoop is proven > - People know how to use it > - There are a lot of deployments out there Do you need more? ;-) > I know it seems like a bit of a rhetorical question but the reason I ask is > because it doesn't seems to be the right tool for the job as it was not > designed with this purpose in mind. Maybe something like Mesos would be > worth investigating as it allows multiple distributed data processing > frameworks to execute side-by-side and doesn't require users to maintain > multiple infrastructures to run the different frameworks. I look forward to > your response and hope that some good may come out of this discussion. This is true if you need to run side-by-side multiple processing frameworks. However, many are probably happy with tools from the Hadoop 'ecosystem': HBase, Pig, Hive, Giraph, Cascading, Oozie, (add here your favorite project which works on top of HDFS|MapReduce), etc. They all work on top of HDFS and MapReduce and if, they solve your 'big data' problems, there is little need for yet another cluster/resource manager. Moreover, companies providing support and solutions for Hadoop do not currently use/support Mesos to deploy and manage Hadoop clusters (this is IMHO another factor to consider). Last but not least, the next MapReduce architecture (a.k.a. YARN) is coming and I guess it will further reduce the need for yet another resource manager in addition to allow and embrace different paradigms from MapReduce (such as Pregel, MPI, etc.) as well as allowing to run different and/or customized versions of MapReduce APIs. Having shared my opinions, I do not want to take anything away from a project such as Mesos (which I know very little) and I am happy to see it in the Apache Incubator :-) (the documentation also isn't that bad, thanks). Paolo > > Best Regards, > Thomas >
