On 26 November 2012 21:25, Radim Kolar <h...@filez.com> wrote: > > The main "feature" is that when you get the +1 vote you yourself get to >> deal with the grunge work of apply >> patches to one or more svn branches, resyncing that with the git branches >> you inevitably do your own work on. >> > no, main feature is major speed advantage. It takes forever to get > something committed. I was annoyed with apache nutch last year and forked > it, here is snapshot from forked codebase http://forum.lupa.cz/index.** > php?action=dlattach;topic=**1674.0;attach=3439<http://forum.lupa.cz/index.php?action=dlattach;topic=1674.0;attach=3439>now > its 160k LOC on top of apache nutch 1.4. If i worked with these guys, > it would be never done because it took them 4 months to get 200 lines patch > reviewed. > > I'm sorry you missed the bit in my slides where I emphasised that review-then-commit is the same rule even if you are a committer. It's not like you can suddenly put changes in without having gone through the JIRA circuit. I also tried to explain why the project is so rigorous:
the value of Hadoop is the data stored in HDFS. Imagine someone could put some minor bit of tuning in there that speeded up their cluster slightly, but increased the risk of data loss. Or something to the MR layer that introduced enough of a performance overhead that someone like facebook would have to buy an extra rack of machines. That's why there's a review process. Try getting a patch into ext4 or the linux kernel scheduler and see if its any easier. > Hadoop has huge backlog of patches, you need way more committers then you > have today. I simply could not assign person to working on hadoop fulltime > because if he submits mere 5 patches per day, you will be never able to > process them. > > The bottleneck is not #of committers, it is #of people who understand hadoop well enough to be able to provide adequate reviews -and who have the time to review patches thoroughly -especially the big ones. I think that is a real problem. > Your current development process fail to scale. What are your plans for > moving development faster? > I don't disagree -again, in my slides I tried to make some proposals. 1. even if the source stays in SVN, we could use git-style work of pull requests and gerrit/github code reviewing 2. better distributed development events, where a group of people can go online via a google+ hangout and work together on a specific problem in real-time. 3. more rigorous "review sundays" or similar -where we go through the review queue on a free weekend day and see what can be done about them. 4. Some kind of mentorship process to work with people on larger projects. Again, time is the constraint here. If you've got some other ideas, it'd be good to know them.