Todd,

On Sep 1, 2012, at 1:20 AM, Todd Lipcon wrote:

> I'd actually contend that YARN was merged too early. I have yet to see
> anyone running YARN in production, and it's holding up the "Stable"
> moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and
> I'm seeing fewer issues in our customers running Hadoop HDFS 2
> compared to Hadoop 1-derived code.

You know I respect you a ton, but I'm very saddened to see you perpetuate this 
FUD on our public lists. I expected better, particularly when everyone is 
working towards the same goals of advancing Hadoop-2. This sniping on other 
members doing work is, um, I'll just stop here rather than regret later.

I'm pretty sure you realize this (we've talked about this privately), yet, for 
other users who might not be aware:
# YARN has been deployed on, by almost everyone's standards, on a very LARGE 
~450 node cluster for 6 months now at Yahoo.
# The entire YARN & MapReduce developer community has done an enormous amount 
of testing, compatibility work and performance work for many months now. It's 
been clear that YARN/MRv2 is superior to MR1 on every dimension - performance 
(2x in several cases), scale etc.; all dimensions which are critical for 
Hadoop's success in the past and future.
# Not just MR, this work has been done across the stack - Pig, Oozie, HCatalog 
etc. This has been an enormous amount of work not just by YARN/MRv2, but by all 
these communities.
# Many thousands of unique end-user applications at Yahoo have *certified* 
YARN/MRv2. That is pretty much *all* MapReduce, Pig etc. applications at Yahoo 
- the most advanced Hadoop deployment in the world.
# It is now *days* away from being deployed on one of the largest and most 
demanding Hadoop clusters in the world with several *thousand* nodes and 
millions of applications per month. See Bobby's note if you don't believe me.

Notice, I didn't talk about any of the other benefits of YARN such as other 
frameworks to MR etc. - you'll see more of this such as real-time applications 
on Hadoop clusters over the next many months. For e.g. see discussions on 
Storm/S4 lists about YARN prototypes at various stages of availability.

Paying you back with the same coin, after being declared *done*, HDFS2 had 
several BASIC issues such as a non-working upgrade from hadoop-1 (HDFS-3731, 
HDFS-3579) or edit-log corruption (HDFS-3626). Maybe you or the customers you 
talk about don't care about it, whatever. For e.g. is the QJM work part of 
stable HDFS2? It's not even code complete yet. 

IAC, It's pretty obvious we have different standards for declaring HDFS stable 
v/s YARN/MRv2 as stable. The standards I'm used to, being around since the dawn 
of this project, is what I use to measure stability i.e. deployed and stable 
for weeks/months on some of the largest Hadoop clusters in the world before 
letting it loose on other 'customers'. 

Given that upgrade-failures or data-corruption is acceptable, is YARN 'stable'? 
By the same standards - YES! - for many months now, much before HDFS HA was 
even code complete!

I don't want to engage in a debate on this further or expect you to care about 
YARN/MRv2, but please, for heavens' sake, do not publicly diss the work so many 
people have done for many, many months now or accuse them of *holding up 
Hadoop* - it's very poor form. 

I'm very proud to have contributed to this effort, even more to have worked 
with such a talented and dedicated bunch. A acknowledgement would be nice, but 
the least I/we *do* expect is absence of public sniping by other members of the 
Hadoop community.

respectfully,
Arun

Reply via email to