Ted brought up some superficial differences, but if you want to understand technical differences, there are a bunch of those as well. Mesos and Hadoop next-gen have similar goals (more efficient resource sharing for data centers), but they are coming at it from different angles -- HNG is currently mainly focusing on MapReduce and aims to support other types of applications too, while Mesos was meant to support a very diverse set of applications, including long-running services and batch jobs (rather than only multiple instances of MapReduce), and is in fact being used for that already. More importantly, HNG is really two pieces -- a refactoring of MapReduce to allow one instance of MR per application, and a resource manager called YARN that lets these instances coordinate. We are going to support having the new MR2 application masters run on top of Mesos instead of YARN too (and indeed the refactoring is nice because it will enable Hadoop MapReduce to run on other cluster scheduling systems in the future).
In terms of the technical differences, here are some of the main ones currently: - Mesos is implemented in C++ rather than Java, and has APIs in C++ and Python in addition to Java. - The resource allocation models are different: HNG has a central scheduler that supports data locality constraints, while Mesos provides "resource offers" to let applications pick the resources they like according to other criteria in addition to requests/filters to describe which resources you want to be offered. Our belief is that resource offers will allow Mesos to support a wider range of application scheduling needs, while simultaneously making the system more scalable and highly available (minimizing the state and work required of the master). - Mesos can enforce resource isolation through Linux Containers to guard against misbehaving / greedy tasks. - HNG supports Kerberos authentication for users. - HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop 0.20, Spark and MPI. - There are some smaller architectural differences that may matter for some applications, such as communication being based on message-passing in Mesos vs periodic heartbeats in HNG, which allows Mesos to provide lower scheduling latencies (e.g. to still be efficient if your tasks take 100ms each). However, overall, as Ted said, many of these differences will likely go away as both projects add features. What will be interesting is whether some fundamental differences in the target workloads remain, which I think is likely to happen. For example, the main deployment of Mesos is currently to run long-running stream processing services at Twitter, which is something that typical Hadoop environments just don't do and that requires different things from the cluster scheduler. I also believe we're going to see a lot of other cluster scheduling systems besides Mesos and HNG in the future, as people's requirements for these systems grow. There are some very challenging problems in designing a general cluster scheduling system that even the Google folks are still working hard on. Matei On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote: > Thanks for your nice and quick explanation! > > On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <[email protected]> wrote: >> Technically speaking, Mesos has a less expressive model for expressing >> resource requirements. The thesis of Mesos is that the negotiation between >> application and scheduler can make up for this missing information. Mesos >> was also first to "market", but Hadoop nextGen is catching up fast. The >> MR-279 has code that works, albeit with some issues in production use. From >> all reports, these issues are being resolved quickly as Yahoo's considerable >> QA resources come to bear. >> >> Politically speaking, Mesos has a nearly inactive mailing list which, to >> outward appearances, indicate a nearly inactive project. There is some >> evidence that considerable activity is occurring off-list, but this is a >> process bug in the Apache model since "if it doesn't happen on the list, it >> doesn't happen". >> >> On the other side, Hadoop nextGen has the Hadoop community pretty much >> behind it. Since HNG has the potential to breakdown some of the deadlocks >> that have plagued the Hadoop community release process, there is >> considerable enthusiasm for it. >> >> Combined, these factors make it much more likely that HNG will be the >> dominant force in the Hadoop world. That is, more likely in my own >> estimation. Others may differ. >> >> >> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon <[email protected]>wrote: >> >>> Hi, >>> >>> I'm newbie, and wonder what's the main differences between Hadoop >>> nextGen and Mesos. >>> >>> Thanks. >>> -- >>> Best Regards, Edward J. Yoon >>> @eddieyoon >>> >> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon
