[ 
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883273#comment-13883273
 ] 

Aaron T. Myers commented on HDFS-5840:
--------------------------------------

>From Suresh:

I am adding information about the design, the way I understand it. Let me know 
if I got it wrong.
*Upgrade preparation:*
# New bits are installed on the cluster nodes.
# The cluster is brought down.

*Upgrade:* For HA setup, choose one of the namenodes to initiate upgrade on and 
start it with -upgrade flag.
# NN performs preupgrade for all non shared storage directories by moving 
current to previous.tmp and creating new current.
#* Failure here is fine. NN start up fails. Next attempt at upgrade the storage 
directories are recovered.
# NN performs preupgrade of shared edits (NFS/JournalNodes) over RPC. 
JournalNodes current moved to previous.tmp and new current is created.
#* If one of the JN preupgrade fails and upgrade is reattempted, editlog 
directory could be lost on the JN. Restarting the JN does not fix the issue.
# NN performs upgrade of non shared edits by writing new CTIME to current and 
moving previous.tmp to previous.
#* If one of the JN preupgrade fails and upgrade is reattempted, editlog 
directory could be lost on the JN. Restarting the JN does not fix the issue.
# NN performs upgrade of shared edits (NFS/JournalNodes) over RPC. JournalNodes 
current has new CTIM and previous.tmp is moved to previous.
# We need to document that all the JournalNodes must be up. If a JN is 
irrecoverably lost, configuration must be changed to exclude the JN.

*Rollback:* NN is started with rollback flag
# For all the non shared directories, the NN checks for canRollBack, 
essentially ensures that previous directory with the right layout version 
exists.
# For all the shared directories, the NN checks for canRollBack, essentially 
ensures that previous directory with the right layout version exists.
# NN performs rollback for shared directories (moving previous to current)
#* If rollback of one of the JN fails, then directories are in inconsistent 
state. I think any attempt at retrying rollback will fail and will require 
manually moving files around. I do not think restarting JN fixes this.
# We need to document that all the JournalNodes must be up. If a JN is 
irrecoverably lost, configuration must be changed to exclude the JN.

*Finalize:* DFSAdmin command is run to finalize the upgrade.
# Active NN performs finalizing of editlog. If JN's fail to finalize, active NN 
fails to finalize. However it is possible that standby finalizes, leaving the 
cluster in an inconsistent state.
# We need to document that all the JournalNodes must be up. If a JN is 
irrecoverably lost, configuration must be changed to exclude the JN.

Comments on the code in the patch (this is almost complete):
Comments:
# Minor nit: there are some white space changes
# assertAllResultsEqual - for loop can just start with i = 1? Also if the 
collection objects is of size zero or one, the method can return early. Is 
there a need to do object.toArray() for these early checks? With that, perhaps 
the findbugs exclude may not be necessary.
# Unit test can be added for methods isAtLeastOneActive, 
getRpcAddressesForNameserviceId and getProxiesForAllNameNodesInNameservice (I 
am okay if this is done in a separate jira)
# Finalizing upgrade is quite tricky. Consider the following scenarios:
#* One NN is active and the other is standby - works fine
#* One NN is active and the other is down or all NNs - finalize command throws 
exception and the user will not know if it has succeeded or failed and what to 
do next
#* No active NN - throws an exception cannot finalize with no active
#* BlockPoolSliceStorage.java change seems unnecessary
# Why is {{throw new AssertionError("Unreachable code.");}} in 
QuorumJournalManager.java methods?
# FSImage#doRollBack() - when canRollBack is false after checking if non-share 
directories can rollback, an exception must be immediately thrown, instead of 
checking shared editlog. Also printing Log.info when storages can be rolled 
back will help in debugging.
# FSEditlog#canRollBackSharedLog should accept StorageInfo instead of Storage
# QuorumJournalManager#canRollBack and getJournalCTime can throw AssertionError 
(from DFSUtil.assertAllResultsEqual()). Is that the right exception to expose 
or IOException?
# Namenode startup throws AssertionError with -rollback option. I think we 
should throw IOException, which is how all the other failures are indicated.

> Follow-up to HDFS-5138 to improve error handling during partial upgrade 
> failures
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-5840
>                 URL: https://issues.apache.org/jira/browse/HDFS-5840
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>             Fix For: 3.0.0
>
>
> Suresh posted some good comment in HDFS-5138 after that patch had already 
> been committed to trunk. This JIRA is to address those. See the first comment 
> of this JIRA for the full content of the review.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to