Understood.

On Fri, Jul 1, 2011 at 1:59 PM, Matei Zaharia <[email protected]> wrote:
> I wouldn't say it's designed for Yahoo! only, but it's definitely meant to 
> solve issues they saw with large Hadoop clusters (and provides a lot of value 
> for that).
>
> Matei
>
> On Jul 1, 2011, at 12:51 AM, Edward J. Yoon wrote:
>
>> Hmm, HNG seems designed for their (Y!) own circumstance.
>>
>> On Fri, Jul 1, 2011 at 12:47 PM, Matei Zaharia <[email protected]> 
>> wrote:
>>> Ted brought up some superficial differences, but if you want to understand 
>>> technical differences, there are a bunch of those as well. Mesos and Hadoop 
>>> next-gen have similar goals (more efficient resource sharing for data 
>>> centers), but they are coming at it from different angles -- HNG is 
>>> currently mainly focusing on MapReduce and aims to support other types of 
>>> applications too, while Mesos was meant to support a very diverse set of 
>>> applications, including long-running services and batch jobs (rather than 
>>> only multiple instances of MapReduce), and is in fact being used for that 
>>> already. More importantly, HNG is really two pieces -- a refactoring of 
>>> MapReduce to allow one instance of MR per application, and a resource 
>>> manager called YARN that lets these instances coordinate. We are going to 
>>> support having the new MR2 application masters run on top of Mesos instead 
>>> of YARN too (and indeed the refactoring is nice because it will enable 
>>> Hadoop MapReduce to run on other cluster scheduling systems in the future).
>>>
>>> In terms of the technical differences, here are some of the main ones 
>>> currently:
>>>
>>> - Mesos is implemented in C++ rather than Java, and has APIs in C++ and 
>>> Python in addition to Java.
>>>
>>> - The resource allocation models are different: HNG has a central scheduler 
>>> that supports data locality constraints, while Mesos provides "resource 
>>> offers" to let applications pick the resources they like according to other 
>>> criteria in addition to requests/filters to describe which resources you 
>>> want to be offered. Our belief is that resource offers will allow Mesos to 
>>> support a wider range of application scheduling needs, while simultaneously 
>>> making the system more scalable and highly available (minimizing the state 
>>> and work required of the master).
>>>
>>> - Mesos can enforce resource isolation through Linux Containers to guard 
>>> against misbehaving / greedy tasks.
>>>
>>> - HNG supports Kerberos authentication for users.
>>>
>>> - HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop 0.20, 
>>> Spark and MPI.
>>>
>>> - There are some smaller architectural differences that may matter for some 
>>> applications, such as communication being based on message-passing in Mesos 
>>> vs periodic heartbeats in HNG, which allows Mesos to provide lower 
>>> scheduling latencies (e.g. to still be efficient if your tasks take 100ms 
>>> each).
>>>
>>> However, overall, as Ted said, many of these differences will likely go 
>>> away as both projects add features. What will be interesting is whether 
>>> some fundamental differences in the target workloads remain, which I think 
>>> is likely to happen. For example, the main deployment of Mesos is currently 
>>> to run long-running stream processing services at Twitter, which is 
>>> something that typical Hadoop environments just don't do and that requires 
>>> different things from the cluster scheduler. I also believe we're going to 
>>> see a lot of other cluster scheduling systems besides Mesos and HNG in the 
>>> future, as people's requirements for these systems grow. There are some 
>>> very challenging problems in designing a general cluster scheduling system 
>>> that even the Google folks are still working hard on.
>>>
>>> Matei
>>>
>>>
>>>
>>> On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote:
>>>
>>>> Thanks for your nice and quick explanation!
>>>>
>>>> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <[email protected]> wrote:
>>>>> Technically speaking, Mesos has a less expressive model for expressing
>>>>> resource requirements.  The thesis of Mesos is that the negotiation 
>>>>> between
>>>>> application and scheduler can make up for this missing information.  Mesos
>>>>> was also first to "market", but Hadoop nextGen is catching up fast.  The
>>>>> MR-279 has code that works, albeit with some issues in production use.  
>>>>> From
>>>>> all reports, these issues are being resolved quickly as Yahoo's 
>>>>> considerable
>>>>> QA resources come to bear.
>>>>>
>>>>> Politically speaking, Mesos has a nearly inactive mailing list which, to
>>>>> outward appearances, indicate a nearly inactive project.  There is some
>>>>> evidence that considerable activity is occurring off-list, but this is a
>>>>> process bug in the Apache model since "if it doesn't happen on the list, 
>>>>> it
>>>>> doesn't happen".
>>>>>
>>>>> On the other side, Hadoop nextGen has the Hadoop community pretty much
>>>>> behind it.  Since HNG has the potential to breakdown some of the deadlocks
>>>>> that have plagued the Hadoop community release process, there is
>>>>> considerable enthusiasm for it.
>>>>>
>>>>> Combined, these factors make it much more likely that HNG will be the
>>>>> dominant force in the Hadoop world.  That is, more likely in my own
>>>>> estimation.  Others may differ.
>>>>>
>>>>>
>>>>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon 
>>>>> <[email protected]>wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm newbie, and wonder what's the main differences between Hadoop
>>>>>> nextGen and Mesos.
>>>>>>
>>>>>> Thanks.
>>>>>> --
>>>>>> Best Regards, Edward J. Yoon
>>>>>> @eddieyoon
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards, Edward J. Yoon
>>>> @eddieyoon
>>>
>>>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Reply via email to