On Tue, Aug 28, 2012 at 4:12 PM, Arun C Murthy <a...@hortonworks.com> wrote: > On Aug 23, 2012, at 9:20 PM, Eli Collins wrote: > >> Per this thread [1] should we have a single set of committers for the >> entire Hadoop project, ie all subprojects? > > I feel like we need to have a wider discussion here. > > This discussion started when a diverse set of folks working on YARN for a > year and a half wanted their own identity and an acknowledgement of the fact > that they are a distinct community. In retrospect, I went about convincing > the wider Hadoop community about this in the wrong way. My apologies. > > Upon reflection, I think Chris Mattman has convinced me that we have an even > wider issue at hand and that the right way to a better place, not just for > YARN, but for all of Hadoop, is to expedite the process of graduating Hadoop > sub-projects into TLPs. This is a mere reflection of the fact that Hadoop is > not a single community. > > Historically there have been at least 2 communities (HDFS, MapReduce) under > the Hadoop umbrella; and there now 3 (HDFS, MapReduce, YARN). > At least for the last 3 years, if not more, the overwhelming majority of > contributors to Hadoop have focussed exclusively on one of the sub-projects. > That is a clear indicator. > This is exactly the thinking behind graduating former sub-projects like > HBase, Hive & Pig graduating, upon the nudge received by the Hadoop PMC from > the Board. > > The good news is that, in principle, most seem to agree on the need for > Hadoop sub-projects to stand alone and the path to get there. It could lead > to several great outcomes such as ensuring HDFS pays equal attention to HBase > as MapReduce, YARN pays attention to projects beyond MapReduce etc. by not > tying them together. > > Rather than sweep this under the carpet, I feel we are better off > acknowledging this. > > This is very much in keeping with the way the ASF and the Board wants to see > communities - small and focussed on a single project. > > A meta or umbrella community like Hadoop leads to issues which are well > documented and understood in the ASF, something experienced Apache Members > like Chris Mattman have repeatedly pointed out. > > It is also fair, per Chris Douglas, to set a reasonable time frame. After due > consideration, I think doing this before hadoop-2 is declared stable (GA) is > the most reasonable option. It gives us necessary headroom hereupon and will > ensure we don't confuse users further by doing it post-fact hadoop-2. Let's > discuss the mechanics, timelines etc. further. > > Yes, this is hard work and there are several technical challenges. But, the > ASF is all about communities and I'm sure we can solve these technical issues > for a better long-term health of these distinct communities. > > Thoughts?
I'd start a separate discussion thread or vote about moving some or all of the sub-projects to TLPs. IMO we should resolve this issue independently - there's no reason to block this decision on a possible future direction for the project. For example if YARN spins out as a TLP this issue still remains for the rest of the sub-projects, so I don't want to stall progress on this on the larger more complex discussion of whether all projects become TLPs. And if a sub-project spins out as a TLP that's a great opportunity to figure out the right set of committers. Ie the decision here doesn't prevent YARN from establishing a new committer lists if/when it spins out. Thanks, Eli