[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883289#comment-13883289 ]
Aaron T. Myers commented on HDFS-5138: -------------------------------------- Hi Suresh, it's obviously fine that you're busy (we all are) but in the future please just let me know that you intend to review it and that we should hold off for committing it for a bit. I reached out to you more than once last week to ask about a review timeline and never heard back from you, so I asked Todd to commit it (I'm traveling at the moment) given the silence. bq. I had brought up one issue about potentially losing editlogs on JournalNode. This scenario isn't possible as you described because either the pre-upgrade or upgrade stages (depending upon when the original failure happened) will fail to rename the dir if it already exists. That said, your points about improving the documentation and the recovery procedure in the event of partial failure of the upgrade are well taken and certainly worth addressing. Upon looking at it further, I also think we should change a few of the assertions in the code to be actual exceptions, since we shouldn't have to be running with assertions enabled to check these error conditions, which should harden all of these code paths a bit more. bq. please address the comments before merging to branch-2. OK, I've filed HDFS-5840 to address your latest comments. Please follow that JIRA and review it as promptly as you can. I'm going to resolve this JIRA for now with a fix version of 3.0.0 and will merge both JIRAs to branch-2 when HDFS-5840 is completed. > Support HDFS upgrade in HA > -------------------------- > > Key: HDFS-5138 > URL: https://issues.apache.org/jira/browse/HDFS-5138 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.1.1-beta > Reporter: Kihwal Lee > Assignee: Aaron T. Myers > Priority: Blocker > Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > hdfs-5138-branch-2.txt > > > With HA enabled, NN wo't start with "-upgrade". Since there has been a layout > version change between 2.0.x and 2.1.x, starting NN in upgrade mode was > necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way > to get around this was to disable HA and upgrade. > The NN and the cluster cannot be flipped back to HA until the upgrade is > finalized. If HA is disabled only on NN for layout upgrade and HA is turned > back on without involving DNs, things will work, but finaliizeUpgrade won't > work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade > snapshots won't get removed. > We will need a different ways of doing layout upgrade and upgrade snapshot. > I am marking this as a 2.1.1-beta blocker based on feedback from others. If > there is a reasonable workaround that does not increase maintenance window > greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)