On Aug 23, 2012, at 9:20 PM, Eli Collins wrote:

> Per this thread [1] should we have a single set of committers for the
> entire Hadoop project, ie all subprojects?

I feel like we need to have a wider discussion here.

This discussion started when a diverse set of folks working on YARN for a year 
and a half wanted their own identity and an acknowledgement of the fact that 
they are a distinct community. In retrospect, I went about convincing the wider 
Hadoop community about this in the wrong way. My apologies.

Upon reflection, I think Chris Mattman has convinced me that we have an even 
wider issue at hand and that the right way to a better place, not just for 
YARN, but for all of Hadoop, is to expedite the process of graduating Hadoop 
sub-projects into TLPs. This is a mere reflection of the fact that Hadoop is 
not a single community.

Historically there have been at least 2 communities (HDFS, MapReduce) under the 
Hadoop umbrella; and there now 3 (HDFS, MapReduce, YARN).
At least for the last 3 years, if not more, the overwhelming majority of 
contributors to Hadoop have focussed exclusively on one of the sub-projects. 
That is a clear indicator.
This is exactly the thinking behind graduating former sub-projects like HBase, 
Hive & Pig graduating, upon the nudge received by the Hadoop PMC from the Board.

The good news is that, in principle, most seem to agree on the need for Hadoop 
sub-projects to stand alone and the path to get there. It could lead to several 
great outcomes such as ensuring HDFS pays equal attention to HBase as 
MapReduce, YARN pays attention to projects beyond MapReduce etc. by not tying 
them together.

Rather than sweep this under the carpet, I feel we are better off acknowledging 
this.

This is very much in keeping with the way the ASF and the Board wants to see 
communities - small and focussed on a single project.

A meta or umbrella community like Hadoop leads to issues which are well 
documented and understood in the ASF, something experienced Apache Members like 
Chris Mattman have repeatedly pointed out.

It is also fair, per Chris Douglas, to set a reasonable time frame. After due 
consideration, I think doing this before hadoop-2 is declared stable (GA) is 
the most reasonable option. It gives us necessary headroom hereupon and will 
ensure we don't confuse users further by doing it post-fact hadoop-2. Let's 
discuss the mechanics, timelines etc. further.

Yes, this is hard work and there are several technical challenges. But, the ASF 
is all about communities and I'm sure we can solve these technical issues for a 
better long-term health of these distinct communities.

Thoughts?

thanks,
Arun

Reply via email to