On Thu, May 5, 2011 at 10:32 AM, Eric Yang <[email protected]> wrote:
> Git is powerful in maintaining different branch of the source code. > However, it will only work if the entire community is willing to move to > git. Maintaining svn and git hybrid, is a time consuming task that we are > paying in full price. Hadoop community should work smarter for the source > control. What do people think about fully adopting git instead of svn? > +1 for Git as a tool. But using git makes it even _more_ important that we have a clearly defined release process that outlines which branches are meant to be released as official artifacts, and what the inclusion criteria for those branches should be. -Todd > On 5/5/11 4:35 AM, "Steve Loughran" <[email protected]> wrote: > > On 05/05/11 10:51, Tony Valderrama wrote: > > Hi, I just wanted to drop in a few thoughts from a new developer > > working outside of the Hadoop developer community. > > > > On Wed, May 4, 2011 at 7:39 PM, Eric Yang<[email protected]> wrote: > >> While the world demand agility, the "review then commit" process is > preventing progress > >> from happening. People end up having to generate multiple version of > patches to ensure > >> the code can be applied. The large lag time between patch generation > and reviewed > >> is taking significant toll on the community and progress. > > > >> Yahoo have a great team of developers who improves Hadoop at faster pace > with its own > >> fork of the source code. The reason that Yahoo was able to achieve > faster improvement with > >> features was due to the ability to use source code repository tools > properly. Unfortunate > >> for Yahoo, their source code repository was not Apache svn trunk. > > > > I agree that the review process is broken. However, the current > > situation is exactly the result of a lack of adherence to this and > > other processes. Various subgroups within the community have > > (intentionally or unintentionally) hijacked the project at different > > times by avoiding community processes in the interest of agility or > > commercial benefit, and the result is a highly fragmented project with > > no clear direction. > > > > From the outside, Hadoop looks like a Yahoo/Cloudera project which > > occasionally gets an Apache stamp. Given the lack of adherence to > > processes, as a non-Yahoo/Cloudera developer I have no way of breaking > > into the development community. Who's going to review or commit > > patches I submit? And which of the myriad versions should I even be > > trying to patch against? And given the speed with which undocumented > > changes are being made, how am I supposed to figure out if my changes > > are going to be relevant or viable next week? We'd love to contribute > > back, but it's just not clear that we or other small players have any > > place within the Hadoop developer community. > > As someone who has commit rights but undercommits, here are my issues > -I am not full time on hadoop, I have little time to keep my own code > up to date, let alone review patches > -I am not fully up to date with all the changes or subtleties in what > is a big, complicated system > -I don't want to break the big systems (Y!, Facebook) by introducing > changes that work on my network and my (small, dynamic) clusters but > which place limitations on scale. It's why I prefer review by those > people who do work on large scale projects. > > > > >> Use JIRA, if there is large feature set that requires brain storming, > and developers > >> should have the ability to make small incremental changes without RTC. > This will ensure developers > >> help each other rather than policing each other. > > > > As an outsider, JIRA is the only way I've been able to follow the > > changes to Hadoop's code and guess where the project is heading. > > Permitting developers to commit without review or documentation will > > just further exclude anyone who can't walk down the hall and knock on > > an office door to ask about a commit. > > I've worked in other ASF projects (Axis) where some large dev teams > (IBM) used to make decisions in team meetings and propagate them. It's > faster, but less community centric, and when a large dev team (IBM) get > re-assigned internally everyone is left not just scrambling to catch up > engineering-wise, but also to make sense of big chunks of > under-documented code. At least the JIRA-based review process not only > provides a discussion log, Hudson/Jenkins checks that there are tests, > no extra warnings, etc. > > What could be interesting would be > -a move to Git to make it easier to pull in patches from other > branches, and for people like Tony to have their own fork under SCM. > -adoption of Gerrit for having each JIRA issue move from being a patch > to a branch (local or remote), so that people can develop the code for > an issue, others can pull it in and merge it, and so that the issue > tracks live code, not dead patches > -more testing of trunk in bigger real/virtual clusters > > I don't know how we can do this, I'd love to hear about experiences > others have with such a process. > > > -- Todd Lipcon Software Engineer, Cloudera
