Understood. On Fri, Jul 1, 2011 at 1:59 PM, Matei Zaharia <[email protected]> wrote: > I wouldn't say it's designed for Yahoo! only, but it's definitely meant to > solve issues they saw with large Hadoop clusters (and provides a lot of value > for that). > > Matei > > On Jul 1, 2011, at 12:51 AM, Edward J. Yoon wrote: > >> Hmm, HNG seems designed for their (Y!) own circumstance. >> >> On Fri, Jul 1, 2011 at 12:47 PM, Matei Zaharia <[email protected]> >> wrote: >>> Ted brought up some superficial differences, but if you want to understand >>> technical differences, there are a bunch of those as well. Mesos and Hadoop >>> next-gen have similar goals (more efficient resource sharing for data >>> centers), but they are coming at it from different angles -- HNG is >>> currently mainly focusing on MapReduce and aims to support other types of >>> applications too, while Mesos was meant to support a very diverse set of >>> applications, including long-running services and batch jobs (rather than >>> only multiple instances of MapReduce), and is in fact being used for that >>> already. More importantly, HNG is really two pieces -- a refactoring of >>> MapReduce to allow one instance of MR per application, and a resource >>> manager called YARN that lets these instances coordinate. We are going to >>> support having the new MR2 application masters run on top of Mesos instead >>> of YARN too (and indeed the refactoring is nice because it will enable >>> Hadoop MapReduce to run on other cluster scheduling systems in the future). >>> >>> In terms of the technical differences, here are some of the main ones >>> currently: >>> >>> - Mesos is implemented in C++ rather than Java, and has APIs in C++ and >>> Python in addition to Java. >>> >>> - The resource allocation models are different: HNG has a central scheduler >>> that supports data locality constraints, while Mesos provides "resource >>> offers" to let applications pick the resources they like according to other >>> criteria in addition to requests/filters to describe which resources you >>> want to be offered. Our belief is that resource offers will allow Mesos to >>> support a wider range of application scheduling needs, while simultaneously >>> making the system more scalable and highly available (minimizing the state >>> and work required of the master). >>> >>> - Mesos can enforce resource isolation through Linux Containers to guard >>> against misbehaving / greedy tasks. >>> >>> - HNG supports Kerberos authentication for users. >>> >>> - HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop 0.20, >>> Spark and MPI. >>> >>> - There are some smaller architectural differences that may matter for some >>> applications, such as communication being based on message-passing in Mesos >>> vs periodic heartbeats in HNG, which allows Mesos to provide lower >>> scheduling latencies (e.g. to still be efficient if your tasks take 100ms >>> each). >>> >>> However, overall, as Ted said, many of these differences will likely go >>> away as both projects add features. What will be interesting is whether >>> some fundamental differences in the target workloads remain, which I think >>> is likely to happen. For example, the main deployment of Mesos is currently >>> to run long-running stream processing services at Twitter, which is >>> something that typical Hadoop environments just don't do and that requires >>> different things from the cluster scheduler. I also believe we're going to >>> see a lot of other cluster scheduling systems besides Mesos and HNG in the >>> future, as people's requirements for these systems grow. There are some >>> very challenging problems in designing a general cluster scheduling system >>> that even the Google folks are still working hard on. >>> >>> Matei >>> >>> >>> >>> On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote: >>> >>>> Thanks for your nice and quick explanation! >>>> >>>> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <[email protected]> wrote: >>>>> Technically speaking, Mesos has a less expressive model for expressing >>>>> resource requirements. The thesis of Mesos is that the negotiation >>>>> between >>>>> application and scheduler can make up for this missing information. Mesos >>>>> was also first to "market", but Hadoop nextGen is catching up fast. The >>>>> MR-279 has code that works, albeit with some issues in production use. >>>>> From >>>>> all reports, these issues are being resolved quickly as Yahoo's >>>>> considerable >>>>> QA resources come to bear. >>>>> >>>>> Politically speaking, Mesos has a nearly inactive mailing list which, to >>>>> outward appearances, indicate a nearly inactive project. There is some >>>>> evidence that considerable activity is occurring off-list, but this is a >>>>> process bug in the Apache model since "if it doesn't happen on the list, >>>>> it >>>>> doesn't happen". >>>>> >>>>> On the other side, Hadoop nextGen has the Hadoop community pretty much >>>>> behind it. Since HNG has the potential to breakdown some of the deadlocks >>>>> that have plagued the Hadoop community release process, there is >>>>> considerable enthusiasm for it. >>>>> >>>>> Combined, these factors make it much more likely that HNG will be the >>>>> dominant force in the Hadoop world. That is, more likely in my own >>>>> estimation. Others may differ. >>>>> >>>>> >>>>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon >>>>> <[email protected]>wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm newbie, and wonder what's the main differences between Hadoop >>>>>> nextGen and Mesos. >>>>>> >>>>>> Thanks. >>>>>> -- >>>>>> Best Regards, Edward J. Yoon >>>>>> @eddieyoon >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Regards, Edward J. Yoon >>>> @eddieyoon >>> >>> >> >> >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon > >
-- Best Regards, Edward J. Yoon @eddieyoon
