I wouldn't say it's designed for Yahoo! only, but it's definitely meant to 
solve issues they saw with large Hadoop clusters (and provides a lot of value 
for that).

Matei

On Jul 1, 2011, at 12:51 AM, Edward J. Yoon wrote:

> Hmm, HNG seems designed for their (Y!) own circumstance.
> 
> On Fri, Jul 1, 2011 at 12:47 PM, Matei Zaharia <[email protected]> 
> wrote:
>> Ted brought up some superficial differences, but if you want to understand 
>> technical differences, there are a bunch of those as well. Mesos and Hadoop 
>> next-gen have similar goals (more efficient resource sharing for data 
>> centers), but they are coming at it from different angles -- HNG is 
>> currently mainly focusing on MapReduce and aims to support other types of 
>> applications too, while Mesos was meant to support a very diverse set of 
>> applications, including long-running services and batch jobs (rather than 
>> only multiple instances of MapReduce), and is in fact being used for that 
>> already. More importantly, HNG is really two pieces -- a refactoring of 
>> MapReduce to allow one instance of MR per application, and a resource 
>> manager called YARN that lets these instances coordinate. We are going to 
>> support having the new MR2 application masters run on top of Mesos instead 
>> of YARN too (and indeed the refactoring is nice because it will enable 
>> Hadoop MapReduce to run on other cluster scheduling systems in the future).
>> 
>> In terms of the technical differences, here are some of the main ones 
>> currently:
>> 
>> - Mesos is implemented in C++ rather than Java, and has APIs in C++ and 
>> Python in addition to Java.
>> 
>> - The resource allocation models are different: HNG has a central scheduler 
>> that supports data locality constraints, while Mesos provides "resource 
>> offers" to let applications pick the resources they like according to other 
>> criteria in addition to requests/filters to describe which resources you 
>> want to be offered. Our belief is that resource offers will allow Mesos to 
>> support a wider range of application scheduling needs, while simultaneously 
>> making the system more scalable and highly available (minimizing the state 
>> and work required of the master).
>> 
>> - Mesos can enforce resource isolation through Linux Containers to guard 
>> against misbehaving / greedy tasks.
>> 
>> - HNG supports Kerberos authentication for users.
>> 
>> - HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop 0.20, 
>> Spark and MPI.
>> 
>> - There are some smaller architectural differences that may matter for some 
>> applications, such as communication being based on message-passing in Mesos 
>> vs periodic heartbeats in HNG, which allows Mesos to provide lower 
>> scheduling latencies (e.g. to still be efficient if your tasks take 100ms 
>> each).
>> 
>> However, overall, as Ted said, many of these differences will likely go away 
>> as both projects add features. What will be interesting is whether some 
>> fundamental differences in the target workloads remain, which I think is 
>> likely to happen. For example, the main deployment of Mesos is currently to 
>> run long-running stream processing services at Twitter, which is something 
>> that typical Hadoop environments just don't do and that requires different 
>> things from the cluster scheduler. I also believe we're going to see a lot 
>> of other cluster scheduling systems besides Mesos and HNG in the future, as 
>> people's requirements for these systems grow. There are some very 
>> challenging problems in designing a general cluster scheduling system that 
>> even the Google folks are still working hard on.
>> 
>> Matei
>> 
>> 
>> 
>> On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote:
>> 
>>> Thanks for your nice and quick explanation!
>>> 
>>> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <[email protected]> wrote:
>>>> Technically speaking, Mesos has a less expressive model for expressing
>>>> resource requirements.  The thesis of Mesos is that the negotiation between
>>>> application and scheduler can make up for this missing information.  Mesos
>>>> was also first to "market", but Hadoop nextGen is catching up fast.  The
>>>> MR-279 has code that works, albeit with some issues in production use.  
>>>> From
>>>> all reports, these issues are being resolved quickly as Yahoo's 
>>>> considerable
>>>> QA resources come to bear.
>>>> 
>>>> Politically speaking, Mesos has a nearly inactive mailing list which, to
>>>> outward appearances, indicate a nearly inactive project.  There is some
>>>> evidence that considerable activity is occurring off-list, but this is a
>>>> process bug in the Apache model since "if it doesn't happen on the list, it
>>>> doesn't happen".
>>>> 
>>>> On the other side, Hadoop nextGen has the Hadoop community pretty much
>>>> behind it.  Since HNG has the potential to breakdown some of the deadlocks
>>>> that have plagued the Hadoop community release process, there is
>>>> considerable enthusiasm for it.
>>>> 
>>>> Combined, these factors make it much more likely that HNG will be the
>>>> dominant force in the Hadoop world.  That is, more likely in my own
>>>> estimation.  Others may differ.
>>>> 
>>>> 
>>>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon 
>>>> <[email protected]>wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I'm newbie, and wonder what's the main differences between Hadoop
>>>>> nextGen and Mesos.
>>>>> 
>>>>> Thanks.
>>>>> --
>>>>> Best Regards, Edward J. Yoon
>>>>> @eddieyoon
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>> 
>> 
> 
> 
> 
> -- 
> Best Regards, Edward J. Yoon
> @eddieyoon

Reply via email to