Thanks Chris. +1 for reverting form 2.7. This is the least we should do. Can you help doing the needful?
I personally am not completely sold on a release with *only* the layout changes. Like I was saying before, we can let specific users backport this into specific 2.x branches they need and leave it only on trunk / branch-2. That said, I would love to hear others’ thoughts on this, but let’s fork that discussion off from this 2.7.3 thread. Re a fresh 2.8, I have renewed my efforts on 2.8 with a view of cutting an RC in weeks. Not sure if that does or doesn’t help this discussion. Thanks +Vinod > On Apr 5, 2016, at 2:03 PM, Chris Trezzo <ctre...@gmail.com> wrote: > > In light of the additional conversation on HDFS-8791, I would like to > re-propose the following: > > 1. Revert the new datanode layout (HDFS-8791) from the 2.7 branch. The > layout change currently does not support downgrades which breaks our > upgrade/downgrade policies for dot releases. > > 2. Cut a 2.8 release off of the 2.7.3 release with the addition of > HDFS-8791. This would give customers a stable release that they could > deploy with the new layout. As discussed on the jira, this is still in line > with user expectation for minor releases as we have done layout changes in > a number of 2.x minor releases already. The current 2.8 would become 2.9 > and continue its current release schedule. > > What does everyone think? If unsupported downgrades between minor releases > is still not agreeable, then as stated by Vinod, we would need to either > add support for downgrades with dn layout changes or revert the layout > change from branch-2. If we are OK with the layout change in a minor > release, but think that the issue does not affect enough customers to > warrant a separate release, we could simply leave it in branch-2 and let it > be released with the current 2.8. > > > On Mon, Apr 4, 2016 at 1:48 PM, Vinod Kumar Vavilapalli <vino...@apache.org> > wrote: > >> I commented on the JIRA way back (see >> https://issues.apache.org/jira/browse/HDFS-8791?focusedCommentId=15036666&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15036666), >> saying what I said below. Unfortunately, I haven’t followed the patch along >> after my initial comment. >> >> This isn’t about any specific release - starting 2.6 we declared support >> for rolling upgrades and downgrades. Any patch that breaks this should not >> be in branch-2. >> >> Two options from where I stand >> (1) For folks who worked on the patch: Is there a way to make (a) the >> upgrade-downgrade seamless for people who don’t care about this (b) and >> have explicit documentation for people who care to switch this behavior on >> and are willing to risk not having downgrades. If this means a new >> configuration property, so be it. It’s a necessary evil. >> (2) Just let specific users backport this into specific 2.x branches they >> need and leave it only on trunk. >> >> Unless this behavior stops breaking rolling upgrades/downgrades, I think >> we should just revert it from branch-2 and definitely 2.7.3 as it stands >> today. >> >> +Vinod >> >> >>> On Apr 1, 2016, at 2:54 PM, Chris Trezzo <ctre...@gmail.com> wrote: >>> >>> A few thoughts: >>> >>> 1. To echo Andrew Wang, HDFS-8578 (parallel upgrades) should be a >>> prerequisite for HDFS-8791. Without that patch, upgrades can be very slow >>> for data nodes depending on your setup. >>> >>> 2. We have already deployed this patch internally so, with my Twitter hat >>> on, I would be perfectly happy as long as it makes it into trunk and 2.8. >>> That being said, I would be hesitant to deploy the current 2.7.x or 2.6.x >>> releases on a large production cluster that has a diverse set of block >> ids >>> without this patch, especially if your data nodes have a large number of >>> disks or you are using federation. To be clear though: this highly >> depends >>> on your setup and at a minimum you should verify that this regression >> will >>> not affect you. The current block-id based layout in 2.6.x and 2.7.2 has >> a >>> performance regression that gets worse over time. When you see it >> happening >>> on a live cluster, it is one of the harder issues to identify a root >> cause >>> and debug. I do understand that this is currently only affecting a >> smaller >>> number of users, but I also think this number has potential to increase >> as >>> time goes on. Maybe we can issue a warning in the release notes for >> future >>> 2.7.x and 2.6.x releases? >>> >>> 3. One option (this was suggested on HDFS-8791 and I think Sean alluded >> to >>> this proposal on this thread) would be to cut a 2.8 release off of the >>> 2.7.3 release with the new layout. What people currently think of as 2.8 >>> would then become 2.9. This would give customers a stable release that >> they >>> could deploy with the new layout and would not break upgrade and >> downgrade >>> expectations. >>> >>> On Fri, Apr 1, 2016 at 11:32 AM, Andrew Purtell <apurt...@apache.org> >> wrote: >>> >>>> As a downstream consumer of Apache Hadoop 2.7.x releases, I expect we >> would >>>> patch the release to revert HDFS-8791 before pushing it out to >> production. >>>> For what it's worth. >>>> >>>> >>>> On Fri, Apr 1, 2016 at 11:23 AM, Andrew Wang <andrew.w...@cloudera.com> >>>> wrote: >>>> >>>>> One other thing I wanted to bring up regarding HDFS-8791, we haven't >>>>> backported the parallel DN upgrade improvement (HDFS-8578) to >> branch-2.6. >>>>> HDFS-8578 is a very important related fix since otherwise upgrade will >> be >>>>> very slow. >>>>> >>>>> On Thu, Mar 31, 2016 at 10:35 AM, Andrew Wang < >> andrew.w...@cloudera.com> >>>>> wrote: >>>>> >>>>>> As I expressed on HDFS-8791, I do not want to include this JIRA in a >>>>>> maintenance release. I've only seen it crop up on a handful of our >>>>>> customer's clusters, and large users like Twitter and Yahoo that seem >>>> to >>>>> be >>>>>> more affected are also the most able to patch this change in >>>> themselves. >>>>>> >>>>>> Layout upgrades are quite disruptive, and I don't think it's worth >>>>>> breaking upgrade and downgrade expectations when it doesn't affect the >>>>> (in >>>>>> my experience) vast majority of users. >>>>>> >>>>>> Vinod seemed to have a similar opinion in his comment on HDFS-8791, >> but >>>>>> will let him elaborate. >>>>>> >>>>>> Best, >>>>>> Andrew >>>>>> >>>>>> On Thu, Mar 31, 2016 at 9:11 AM, Sean Busbey <bus...@cloudera.com> >>>>> wrote: >>>>>> >>>>>>> As of 2 days ago, there were already 135 jiras associated with 2.7.3, >>>>>>> if *any* of them end up introducing a regression the inclusion of >>>>>>> HDFS-8791 means that folks will have cluster downtime in order to >> back >>>>>>> things out. If that happens to any substantial number of downstream >>>>>>> folks, or any particularly vocal downstream folks, then it is very >>>>>>> likely we'll lose the remaining trust of operators for rolling out >>>>>>> maintenance releases. That's a pretty steep cost. >>>>>>> >>>>>>> Please do not include HDFS-8791 in any 2.6.z release. Folks having to >>>>>>> be aware that an upgrade from e.g. 2.6.5 to 2.7.2 will fail is an >>>>>>> unreasonable burden. >>>>>>> >>>>>>> I agree that this fix is important, I just think we should either cut >>>>>>> a version of 2.8 that includes it or find a way to do it that gives >> an >>>>>>> operational path for rolling downgrade. >>>>>>> >>>>>>> On Thu, Mar 31, 2016 at 10:10 AM, Junping Du <j...@hortonworks.com> >>>>> wrote: >>>>>>>> Thanks for bringing up this topic, Sean. >>>>>>>> When I released our latest Hadoop release 2.6.4, the patch of >>>>> HDFS-8791 >>>>>>> haven't been committed in so that's why we didn't discuss this >>>> earlier. >>>>>>>> I remember in JIRA discussion, we treated this layout change as a >>>>>>> Blocker bug that fixing a significant performance regression before >>>> but >>>>> not >>>>>>> a normal performance improvement. And I believe HDFS community >> already >>>>> did >>>>>>> their best with careful and patient to deliver the fix and other >>>> related >>>>>>> patches (like upgrade fix in HDFS-8578). Take an example of >> HDFS-8578, >>>>> you >>>>>>> can see 30+ rounds patch review back and forth by senior committers, >>>>> not to >>>>>>> mention the outstanding performance test data in HDFS-8791. >>>>>>>> I would trust our HDFS committers' judgement to land HDFS-8791 on >>>>>>> 2.7.3. However, that needs Vinod's final confirmation who serves as >> RM >>>>> for >>>>>>> branch-2.7. In addition, I didn't see any blocker issue to bring it >>>> into >>>>>>> 2.6.5 now. >>>>>>>> Just my 2 cents. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Junping >>>>>>>> >>>>>>>> ________________________________________ >>>>>>>> From: Sean Busbey <bus...@cloudera.com> >>>>>>>> Sent: Thursday, March 31, 2016 2:57 PM >>>>>>>> To: hdfs-...@hadoop.apache.org >>>>>>>> Cc: Hadoop Common; yarn-...@hadoop.apache.org; >>>>>>> mapreduce-...@hadoop.apache.org >>>>>>>> Subject: Re: 2.7.3 release plan >>>>>>>> >>>>>>>> A layout change in a maintenance release sounds very risky. I saw >>>> some >>>>>>>> discussion on the JIRA about those risks, but the consensus seemed >>>> to >>>>>>>> be "we'll leave it up to the 2.6 and 2.7 release managers." I >>>> thought >>>>>>>> we did RMs per release rather than per branch? No one claiming to >>>> be a >>>>>>>> release manager ever spoke up AFAICT. >>>>>>>> >>>>>>>> Should this change be included? Should it go into a special 2.8 >>>>>>>> release as mentioned in the ticket? >>>>>>>> >>>>>>>> On Thu, Mar 31, 2016 at 1:45 AM, Akira AJISAKA >>>>>>>> <ajisa...@oss.nttdata.co.jp> wrote: >>>>>>>>> Thank you Vinod! >>>>>>>>> >>>>>>>>> FYI: 2.7.3 will be a bit special release. >>>>>>>>> >>>>>>>>> HDFS-8791 bumped up the datanode layout version, >>>>>>>>> so rolling downgrade from 2.7.3 to 2.7.[0-2] >>>>>>>>> is impossible. We can rollback instead. >>>>>>>>> >>>>>>>>> https://issues.apache.org/jira/browse/HDFS-8791 >>>>>>>>> >>>>>>> >>>>> >>>> >> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Akira >>>>>>>>> >>>>>>>>> >>>>>>>>> On 3/31/16 08:18, Vinod Kumar Vavilapalli wrote: >>>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> Got nudged about 2.7.3. Was previously waiting for 2.6.4 to go out >>>>>>> (which >>>>>>>>>> did go out mid February). Got a little busy since. >>>>>>>>>> >>>>>>>>>> Following up the 2.7.2 maintenance release, we should work >>>> towards a >>>>>>>>>> 2.7.3. The focus obviously is to have blocker issues [1], >>>> bug-fixes >>>>>>> and *no* >>>>>>>>>> features / improvements. >>>>>>>>>> >>>>>>>>>> I hope to cut an RC in a week - giving enough time for outstanding >>>>>>> blocker >>>>>>>>>> / critical issues. Will start moving out any tickets that are not >>>>>>> blockers >>>>>>>>>> and/or won’t fit the timeline - there are 3 blockers and 15 >>>> critical >>>>>>> tickets >>>>>>>>>> outstanding as of now. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> +Vinod >>>>>>>>>> >>>>>>>>>> [1] 2.7.3 release blockers: >>>>>>>>>> https://issues.apache.org/jira/issues/?filter=12335343 >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> busbey >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> busbey >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> >>>> - Andy >>>> >>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein >>>> (via Tom White) >>>> >> >>