Re: [DISCUSSION] development process of Hadoop

Todd Lipcon Thu, 05 May 2011 10:53:11 -0700

On Thu, May 5, 2011 at 10:32 AM, Eric Yang <[email protected]> wrote:


> Git is powerful in maintaining different branch of the source code.
>  However, it will only work if the entire community is willing to move to
> git.  Maintaining svn and git hybrid, is a time consuming task that we are
> paying in full price.  Hadoop community should work smarter for the source
> control.  What do people think about fully adopting git instead of svn?
>

+1 for Git as a tool. But using git makes it even _more_ important that we
have a clearly defined release process that outlines which branches are
meant to be released as official artifacts, and what the inclusion criteria
for those branches should be.

-Todd


> On 5/5/11 4:35 AM, "Steve Loughran" <[email protected]> wrote:
>
> On 05/05/11 10:51, Tony Valderrama wrote:
> > Hi, I just wanted to drop in a few thoughts from a new developer
> > working outside of the Hadoop developer community.
> >
> > On Wed, May 4, 2011 at 7:39 PM, Eric Yang<[email protected]>  wrote:
> >> While the world demand agility, the "review then commit" process is
> preventing progress
> >> from happening.  People end up having to generate multiple version of
> patches to ensure
> >> the code can be applied.  The large lag time between patch generation
> and reviewed
> >> is taking significant toll on the community and progress.
> >
> >> Yahoo have a great team of developers who improves Hadoop at faster pace
> with its own
> >> fork of the source code.  The reason that Yahoo was able to achieve
> faster improvement with
> >> features was due to the ability to use source code repository tools
> properly.  Unfortunate
> >> for Yahoo, their source code repository was not Apache svn trunk.
> >
> > I agree that the review process is broken.  However, the current
> > situation is exactly the result of a lack of adherence to this and
> > other processes.  Various subgroups within the community have
> > (intentionally or unintentionally) hijacked the project at different
> > times by avoiding community processes in the interest of agility or
> > commercial benefit, and the result is a highly fragmented project with
> > no clear direction.
> >
> >  From the outside, Hadoop looks like a Yahoo/Cloudera project which
> > occasionally gets an Apache stamp.  Given the lack of adherence to
> > processes, as a non-Yahoo/Cloudera developer I have no way of breaking
> > into the development community.  Who's going to review or commit
> > patches I submit?  And which of the myriad versions should I even be
> > trying to patch against?  And given the speed with which undocumented
> > changes are being made, how am I supposed to figure out if my changes
> > are going to be relevant or viable next week?  We'd love to contribute
> > back, but it's just not clear that we or other small players have any
> > place within the Hadoop developer community.
>
> As someone who has commit rights but undercommits, here are my issues
>  -I am not full time on hadoop, I have little time to keep my own code
> up to date, let alone review patches
>  -I am not fully up to date with all the changes or subtleties in what
> is a big, complicated system
>  -I don't want to break the big systems (Y!, Facebook) by introducing
> changes that work on my network and my (small, dynamic) clusters but
> which place limitations on scale. It's why I prefer review by those
> people who do work on large scale projects.
>
> >
> >> Use JIRA, if there is large feature set that requires brain storming,
> and developers
> >> should have the ability to make small incremental changes without RTC.
>  This will ensure developers
> >> help each other rather than policing each other.
> >
> > As an outsider, JIRA is the only way I've been able to follow the
> > changes to Hadoop's code and guess where the project is heading.
> > Permitting developers to commit without review or documentation will
> > just further exclude anyone who can't walk down the hall and knock on
> > an office door to ask about a commit.
>
> I've worked in other ASF projects (Axis) where some large dev teams
> (IBM) used to make decisions in team meetings and propagate them. It's
> faster, but less community centric, and when a large dev team (IBM) get
> re-assigned internally everyone is left not just scrambling to catch up
> engineering-wise, but also to make sense of big chunks of
> under-documented code. At least the JIRA-based review process not only
> provides a discussion log, Hudson/Jenkins checks that there are tests,
> no extra warnings, etc.
>
> What could be interesting would be
>  -a move to Git to make it easier to pull in patches from other
> branches, and for people like Tony to have their own fork under SCM.
>  -adoption of Gerrit for having each JIRA issue move from being a patch
> to a branch (local or remote), so that people can develop the code for
> an issue, others can pull it in and merge it, and so that the issue
> tracks live code, not dead patches
>  -more testing of trunk in bigger real/virtual clusters
>
> I don't know how we can do this, I'd love to hear about experiences
> others have with such a process.
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [DISCUSSION] development process of Hadoop

Reply via email to