Re: [DISCUSSION] development process of Hadoop

Steve Loughran Thu, 05 May 2011 04:36:06 -0700

On 05/05/11 10:51, Tony Valderrama wrote:

Hi, I just wanted to drop in a few thoughts from a new developer
working outside of the Hadoop developer community.


On Wed, May 4, 2011 at 7:39 PM, Eric Yang<[email protected]>  wrote:

While the world demand agility, the "review then commit" process is preventing 
progress
from happening.  People end up having to generate multiple version of patches 
to ensure
the code can be applied.  The large lag time between patch generation and 
reviewed
is taking significant toll on the community and progress.

Yahoo have a great team of developers who improves Hadoop at faster pace with 
its own
fork of the source code.  The reason that Yahoo was able to achieve faster 
improvement with
features was due to the ability to use source code repository tools properly.  
Unfortunate
for Yahoo, their source code repository was not Apache svn trunk.


I agree that the review process is broken.  However, the current
situation is exactly the result of a lack of adherence to this and
other processes.  Various subgroups within the community have
(intentionally or unintentionally) hijacked the project at different
times by avoiding community processes in the interest of agility or
commercial benefit, and the result is a highly fragmented project with
no clear direction.

 From the outside, Hadoop looks like a Yahoo/Cloudera project which
occasionally gets an Apache stamp.  Given the lack of adherence to
processes, as a non-Yahoo/Cloudera developer I have no way of breaking
into the development community.  Who's going to review or commit
patches I submit?  And which of the myriad versions should I even be
trying to patch against?  And given the speed with which undocumented
changes are being made, how am I supposed to figure out if my changes
are going to be relevant or viable next week?  We'd love to contribute
back, but it's just not clear that we or other small players have any
place within the Hadoop developer community.


As someone who has commit rights but undercommits, here are my issues

-I am not full time on hadoop, I have little time to keep my own codeup to date, let alone review patches-I am not fully up to date with all the changes or subtleties in whatis a big, complicated system-I don't want to break the big systems (Y!, Facebook) by introducingchanges that work on my network and my (small, dynamic) clusters butwhich place limitations on scale. It's why I prefer review by thosepeople who do work on large scale projects.

Use JIRA, if there is large feature set that requires brain storming, and 
developers
should have the ability to make small incremental changes without RTC.  This 
will ensure developers
help each other rather than policing each other.


As an outsider, JIRA is the only way I've been able to follow the
changes to Hadoop's code and guess where the project is heading.
Permitting developers to commit without review or documentation will
just further exclude anyone who can't walk down the hall and knock on
an office door to ask about a commit.

I've worked in other ASF projects (Axis) where some large dev teams(IBM) used to make decisions in team meetings and propagate them. It'sfaster, but less community centric, and when a large dev team (IBM) getre-assigned internally everyone is left not just scrambling to catch upengineering-wise, but also to make sense of big chunks ofunder-documented code. At least the JIRA-based review process not onlyprovides a discussion log, Hudson/Jenkins checks that there are tests,no extra warnings, etc.


What could be interesting would be

-a move to Git to make it easier to pull in patches from otherbranches, and for people like Tony to have their own fork under SCM.-adoption of Gerrit for having each JIRA issue move from being a patchto a branch (local or remote), so that people can develop the code foran issue, others can pull it in and merge it, and so that the issuetracks live code, not dead patches

 -more testing of trunk in bigger real/virtual clusters

I don't know how we can do this, I'd love to hear about experiencesothers have with such a process.

Re: [DISCUSSION] development process of Hadoop

Reply via email to