Hey guys, I think we diverged a bit from the initial topic of this discussion, which is removing branch-2.10, and changing the version of branch-2 from 2.11.0-SNAPSHOT to 2.10.1-SNAPSHOT. Sounds like the subject line for this thread "Making 2.10 the last minor 2.x release" confused people. It is in fact a wider matter that can be discussed when somebody actually proposes to release 2.11, which I understand nobody does at the moment.
So if anybody objects removing branch-2.10 please make an argument. Otherwise we should go ahead and just do it next week. I see people still struggling to keep branch-2 and branch-2.10 in sync. Thanks, --Konstantin On Thu, Nov 21, 2019 at 3:49 PM Jonathan Hung <jyhung2...@gmail.com> wrote: > Thanks for the detailed thoughts, everyone. > > Eric (Badger), my understanding is the same as yours re. minor vs patch > releases. As for putting features into minor/patch releases, if we keep the > convention of putting new features only into minor releases, my assumption > is still that it's unlikely people will want to get them into branch-2 > (based on the 2.10.0 release process). For the java 11 issue, we haven't > even really removed support for java 7 in branch-2 (much less java 8), so I > feel moving to java 11 would go along with a move to branch 3. And as you > mentioned, if people really want to use java 11 on branch-2, we can always > revive branch-2. But for now I think the convenience of not needing to port > to both branch-2 and branch-2.10 (and below) outweighs the cost of > potentially needing to revive branch-2. > > Jonathan Hung > > > On Wed, Nov 20, 2019 at 10:50 AM Eric Yang <ey...@cloudera.com> wrote: > >> +1 for 2.10.x as last release for 2.x version. >> >> Software would become more compatible when more companies stress test the >> same software and making improvements in trunk. Some may be extra caution >> on moving up the version because obligation internally to keep things >> running. Company obligation should not be the driving force to maintain >> Hadoop branches. There is no proper collaboration in the community when >> every name brand company maintains its own Hadoop 2.x version. I think it >> would be more healthy for the community to reduce the branch forking and >> spend energy on trunk to harden the software. This will give more >> confidence to move up the version than trying to fix n permutations >> breakage like Flash fixing the timeline. >> >> Apache license stated, there is no warranty of any kind for code >> contributions. Fewer community release process should improve software >> quality when eyes are on trunk, and help steering toward the same end goals. >> >> regards, >> Eric >> >> >> >> On Tue, Nov 19, 2019 at 3:03 PM Eric Badger >> <ebad...@verizonmedia.com.invalid> wrote: >> >>> Hello all, >>> >>> Is it written anywhere what the difference is between a minor release >>> and a >>> point/dot/maintenance (I'll use "point" from here on out) release? I have >>> looked around and I can't find anything other than some compatibility >>> documentation in 2.x that has since been removed in 3.x [1] [2]. I think >>> this would help shape my opinion on whether or not to keep branch-2 >>> alive. >>> My current understanding is that we can't really break compatibility in >>> either a minor or point release. But the only mention of the difference >>> between minor and point releases is how to deal with Stable, Evolving, >>> and >>> Unstable tags, and how to deal with changing default configuration >>> values. >>> So it seems like there really isn't a big official difference between the >>> two. In my mind, the functional difference between the two is that the >>> minor releases may have added features and rewrites, while the point >>> releases only have bug fixes. This might be an incorrect understanding, >>> but >>> that's what I have gathered from watching the releases over the last few >>> years. Whether or not this is a correct understanding, I think that this >>> needs to be documented somewhere, even if it is just a convention. >>> >>> Given my assumed understanding of minor vs point releases, here are the >>> pros/cons that I can think of for having a branch-2. Please add on or >>> correct me for anything you feel is missing or inadequate. >>> Pros: >>> - Features/rewrites/higher-risk patches are less likely to be put into >>> 2.10.x >>> - It is less necessary to move to 3.x >>> >>> Cons: >>> - Bug fixes are less likely to be put into 2.10.x >>> - An extra branch to maintain >>> - Committers have an extra branch (5 vs 4 total branches) to commit >>> patches to if they should go all the way back to 2.10.x >>> - It is less necessary to move to 3.x >>> >>> So on the one hand you get added stability in fewer features being >>> committed to 2.10.x, but then on the other you get fewer bug fixes being >>> committed. In a perfect world, we wouldn't have to make this tradeoff. >>> But >>> we don't live in a perfect world and committers will make mistakes either >>> because of lack of knowledge or simply because they made a mistake. If we >>> have a branch-2, committers will forget, not know to, or choose not to >>> (for >>> whatever reason) commit valid bug fixes back all the way to branch-2.10. >>> If >>> we don't have a branch-2, committers who want their borderline risky >>> feature in the 2.x line will err on the side of putting it into >>> branch-2.10 >>> instead of proposing the creation of a branch-2. Clearly I have made >>> quite >>> a few assumptions here based on my own experiences, so I would like to >>> hear >>> if others have similar or opposing views. >>> >>> As far as 3.x goes, to me it seems like some of the reasoning for killing >>> branch-2 is due to an effort to push the community towards 3.x. This is >>> why >>> I have added movement to 3.x as both a pro and a con. As a community >>> trying >>> to move forward, keeping as many companies on similar branches as >>> possible >>> is a good way to make sure the code is well-tested. However, from a >>> stability point of view, moving to 3.x is still scary and being able to >>> stay on 2.x until you are comfortable to move is very nice. The 2.10.0 >>> bridge release effort has been very good at making it possible for people >>> to move from 2.x in 3.x, but the diff between 2.x and 3.x is so large >>> that >>> it is reasonable for companies to want to be extra cautious with 3.x due >>> to >>> potential performance degradation at large scale. >>> >>> A question I'm pondering is what happens when we move to Java 11 and >>> someone is still on 2.x? If they want to backport HADOOP-15338 >>> <https://issues.apache.org/jira/browse/HADOOP-15338> for Java 11 >>> support to >>> 2.x, surely not everyone is going to want that (at least not >>> immediately). >>> The 2.10 documentation states, "The JVM requirements will not change >>> across >>> point releases within the same minor release except if the JVM version >>> under question becomes unsupported" [1], so this would warrant a 2.11 >>> release until Java 8 becomes unsupported (though one could argue that it >>> is >>> already unsupported since Oracle is no longer giving public Java 8 >>> update). >>> If we don't keep branch-2 around now, would a Java 11 backport be the >>> catalyst for a branch-2 revival? >>> >>> Not sure if this really leads to any sort of answer from me on whether or >>> not we should keep branch-2 alive, but these are the things that I am >>> weighing in my mind. For me, the bigger problem beyond having branch-2 or >>> not is committers not being on the same page with where they should >>> commit >>> their patches. >>> >>> Eric >>> >>> [1] >>> >>> https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/Compatibility.html >>> [2] >>> >>> https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-common/Compatibility.html >>> >>> On Tue, Nov 19, 2019 at 2:49 PM epa...@apache.org <epa...@apache.org> >>> wrote: >>> >>> > Hi Konstantin, >>> > >>> > Sure, I understand those concerns. On the other hand, I worry about the >>> > stability of 2.10, since we will be on it for a couple of years at >>> least. >>> > I worry >>> > that some committers may want to put new features into a branch 2 >>> release, >>> > and without a branch-2, they will go directly into 2.10. Since we >>> don't >>> > always >>> > catch corner cases or performance problems for some time (usually not >>> > until >>> > the release is deployed to a busy, 4-thousand node cluster), it may be >>> > very >>> > difficult to back out those changes. >>> > >>> > It sounds like I'm in the minority here, so I'm not nixing the idea, >>> but I >>> > do >>> > have these reservations. >>> > >>> > Thanks, >>> > -Eric >>> > >>> > >>> > >>> > On Tuesday, November 19, 2019, 1:04:15 AM CST, Konstantin Shvachko < >>> > shv.had...@gmail.com> wrote: >>> > Hi Eric, >>> > >>> > We had a long discussion on this list regarding making the 2.10 >>> release the >>> > last of branch-2 releases. We intended 2.10 as a bridge release between >>> > Hadoop 2 and 3. We may have bug-fix releases or 2.10, but 2.11 is not >>> in >>> > the picture right now, and many people may object this idea. >>> > >>> > I understand Jonathan's proposal as an attempt to >>> > 1. eliminate confusion which branches people should commit their >>> back-ports >>> > to >>> > 2. save engineering effort committing to more branches than necessary >>> > >>> > "Branches are cheap" as our founder used to say. If we ever decide to >>> > release 2.11 we can resurrect the branch. >>> > Until then I am in favor of Jonathan's proposal +1. >>> > >>> > Thanks, >>> > --Konstantin >>> > >>> > >>> > On Mon, Nov 18, 2019 at 10:41 AM Jonathan Hung <jyhung2...@gmail.com> >>> > wrote: >>> > >>> > > Thanks Eric for the comments - regarding your concerns, I feel the >>> pros >>> > > outweigh the cons. To me, the chances of patch releases on 2.10.x are >>> > much >>> > > higher than a new 2.11 minor release. (There didn't seem to be many >>> > people >>> > > outside of our company who expressed interest in getting new >>> features to >>> > > branch-2 prior to the 2.10.0 release.) Even now, a few weeks after >>> 2.10.0 >>> > > release, there's 29 patches that have gone into branch-2 and 9 in >>> > > branch-2.10, so it's already diverged quite a bit. >>> > > >>> > > In any case, we can always reverse this decision if we really need >>> to, by >>> > > recreating branch-2. But this proposal would reduce a lot of >>> confusion >>> > IMO. >>> > > >>> > > Jonathan Hung >>> > > >>> > > >>> > > On Fri, Nov 15, 2019 at 11:41 AM epa...@apache.org < >>> epa...@apache.org> >>> > > wrote: >>> > > >>> > > > Thanks Jonathan for opening the discussion. >>> > > > >>> > > > I am not in favor of this proposal. 2.10 was very recently >>> released, >>> > and >>> > > > moving to 2.10 will take some time for the community. It seems >>> > premature >>> > > to >>> > > > make a decision at this point that there will never be a need for a >>> > 2.11 >>> > > > release. >>> > > > >>> > > > -Eric >>> > > > >>> > > > >>> > > > On Thursday, November 14, 2019, 8:51:59 PM CST, Jonathan Hung < >>> > > > jyhung2...@gmail.com> wrote: >>> > > > >>> > > > Hi folks, >>> > > > >>> > > > Given the release of 2.10.0, and the fact that it's intended to be >>> a >>> > > bridge >>> > > > release to Hadoop 3.x [1], I'm proposing we make 2.10.x the last >>> minor >>> > > > release line in branch-2. Currently, the main issue is that there's >>> > many >>> > > > fixes going into branch-2 (the theoretical 2.11.0) that's not going >>> > into >>> > > > branch-2.10 (which will become 2.10.1), so the fixes in branch-2 >>> will >>> > > > likely never see the light of day unless they are backported to >>> > > > branch-2.10. >>> > > > >>> > > > To do this, I propose we: >>> > > > >>> > > > - Delete branch-2.10 >>> > > > - Rename branch-2 to branch-2.10 >>> > > > - Set version in the new branch-2.10 to 2.10.1-SNAPSHOT >>> > > > >>> > > > This way we get all the current branch-2 fixes into the 2.10.x >>> release >>> > > > line. Then the commit chain will look like: trunk -> branch-3.2 -> >>> > > > branch-3.1 -> branch-2.10 -> branch-2.9 -> branch-2.8 >>> > > > >>> > > > Thoughts? >>> > > > >>> > > > Jonathan Hung >>> > > > >>> > > > [1] >>> > > >>> https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29479.html >>> > > > >>> > > >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org >>> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org >>> > >>> > >>> >>