I agree that "destructive" is not the correct word to describe features like snapshots and windows support. However, I also agree with Konstantin that any large feature will have a destabilizing effect on the code base, even if it is done on a branch and thoroughly tested before being merged in. HDFS HA from what I have seen and heard is rock solid, but it took a while to get there even after it was merged into branch-2. And we all know how long YARN and MRv2 have taken to stabilize.
I also agree that no one individual is able to police all of Hadoop. We have to rely on the committers to make sure that what is placed in a branch is appropriate for that branch in preparation for a release. As a community we need to decided what the goals of a branch are so that I as a committer can know what is and is not appropriate to be placed in that branch. This is the reason why we are discussing API and binary compatibility. This is the reason why I support having a vote for a release plan. The question for the community comes down to do we want to release quickly and often off of trunk trying hard to maintain compatibility between releases or do we want to follow what we have done up to now where a single branch goes into stabilization, trunk gets anything that is not "compatible" with that branch, and it takes a huge effort to switch momentum from one branch to another. Up to this point we have almost successfully done this switch once, from 1.0 to 2.0. I have a hard time believing that we are going to do this again for another 5 years. There is nothing preventing the community from letting each organization decide what they want to do and we end up with both. But this results in fragmentation of the community, and makes it difficult for those trying to stabilize a release because there is no critical mass of individuals using and testing that branch. It also results in the scrambling we are seeing now to try and revert the incompatibles between 1.0 and 2.0 that were introduced in the years between these releases. If we are going to do the same and make 3.0 compatible with 2.0 when the switch comes, why do we even allow any incompatible changes in at all? It just feels like trunk is a place to put tech debt that we are going to try and revert later. I personally like the Linux and BSD models, where there is a new feature merge window and any new features can come in, then the entire community works together to stabilize the release before going on the the next merge window. If the release does not stabilize quickly the next merge window gets pushed back. I realize this is very different from the current model and is not likely to receive a lot of support, but it has worked for them for a long time, and they have code bases just as large as Hadoop and even larger and more diverse communities. I am +1 for Konstantin's release plan and will vote as such on that thread. --Bobby On 5/3/13 3:06 AM, "Konstantin Shvachko" <shv.had...@gmail.com> wrote: >Hi Arun and Suresh, > >I am glad my choice of words attracted your attention. I consider this >important for the project otherwise I wouldn't waste everybody's time. >You tend reacting on a latest message taken out of context, which does not >reveal full picture. >I'll try here to summarize my proposal and motivation expressed earlier in >these two threads: >http://s.apache.org/fs >http://s.apache.org/Streamlining > >I am advocating >1. to make 2.0.5 a release that will > a) make any necessary changes so that Hadoop APIs could be fixed after >that > b) fix bugs: internal and those important for stabilizing downstream >projects >2. Release 2.1.0 stable. I.e. both with stable APIs and stable code base. >3. Produce a series of feature releases. Potentially catching up with the >state of trunk. >4. Release from trunk afterwards. > >The main motivation to minimize changes in 2.0.5 is to let Hadoop users >and >the downstream projects, that is the Hadoop community, to start adapting >to >the new APIs asap. This will provide certainty that people can build their >products on top of 2.0.5 APIs with minimal risk the next release will >break >them. >Thus Bobby in http://goo.gl/jm5am >is saying that the meaning of beta for him is locked down APIs for wire >and >binary compatibility. For Hadoop Yahoo using 2.x is an opportunity to have >it tested at very large scale, which in turn will bring other users on >board. > >I agree with Arun that we are not disagreeing on much. Just on the order >of >execution: what goes first stability or features. >I am not challenging any features, the implementations, or the developers. >But putting all changes together is destructive for the stability of the >release. Adding a 500 KB patch invalidates prio testing solely because it >is a big change that needs testing not only by itself but with upstream >applications. >With 2.0.3 , 2.0.4 tested thoroughly and widely in many organizations and >several distributions it seems like a perfect base for the stable release. >We could be just two steps away from it. > >I tried to explained as good as I could what I suggest, why, and why now. >I >am not here to police, claim, mandate, enforce edicts, be a gatekeeper, >narrow view, tie up knots ... (did I miss any). If we disagree let's do it >by the rules we created for ourselves and move on. Life will self-adjust >and the entropy will keep increasing no matter what. > >Thanks, >--Konstantin