[
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883273#comment-13883273
]
Aaron T. Myers commented on HDFS-5840:
--------------------------------------
>From Suresh:
I am adding information about the design, the way I understand it. Let me know
if I got it wrong.
*Upgrade preparation:*
# New bits are installed on the cluster nodes.
# The cluster is brought down.
*Upgrade:* For HA setup, choose one of the namenodes to initiate upgrade on and
start it with -upgrade flag.
# NN performs preupgrade for all non shared storage directories by moving
current to previous.tmp and creating new current.
#* Failure here is fine. NN start up fails. Next attempt at upgrade the storage
directories are recovered.
# NN performs preupgrade of shared edits (NFS/JournalNodes) over RPC.
JournalNodes current moved to previous.tmp and new current is created.
#* If one of the JN preupgrade fails and upgrade is reattempted, editlog
directory could be lost on the JN. Restarting the JN does not fix the issue.
# NN performs upgrade of non shared edits by writing new CTIME to current and
moving previous.tmp to previous.
#* If one of the JN preupgrade fails and upgrade is reattempted, editlog
directory could be lost on the JN. Restarting the JN does not fix the issue.
# NN performs upgrade of shared edits (NFS/JournalNodes) over RPC. JournalNodes
current has new CTIM and previous.tmp is moved to previous.
# We need to document that all the JournalNodes must be up. If a JN is
irrecoverably lost, configuration must be changed to exclude the JN.
*Rollback:* NN is started with rollback flag
# For all the non shared directories, the NN checks for canRollBack,
essentially ensures that previous directory with the right layout version
exists.
# For all the shared directories, the NN checks for canRollBack, essentially
ensures that previous directory with the right layout version exists.
# NN performs rollback for shared directories (moving previous to current)
#* If rollback of one of the JN fails, then directories are in inconsistent
state. I think any attempt at retrying rollback will fail and will require
manually moving files around. I do not think restarting JN fixes this.
# We need to document that all the JournalNodes must be up. If a JN is
irrecoverably lost, configuration must be changed to exclude the JN.
*Finalize:* DFSAdmin command is run to finalize the upgrade.
# Active NN performs finalizing of editlog. If JN's fail to finalize, active NN
fails to finalize. However it is possible that standby finalizes, leaving the
cluster in an inconsistent state.
# We need to document that all the JournalNodes must be up. If a JN is
irrecoverably lost, configuration must be changed to exclude the JN.
Comments on the code in the patch (this is almost complete):
Comments:
# Minor nit: there are some white space changes
# assertAllResultsEqual - for loop can just start with i = 1? Also if the
collection objects is of size zero or one, the method can return early. Is
there a need to do object.toArray() for these early checks? With that, perhaps
the findbugs exclude may not be necessary.
# Unit test can be added for methods isAtLeastOneActive,
getRpcAddressesForNameserviceId and getProxiesForAllNameNodesInNameservice (I
am okay if this is done in a separate jira)
# Finalizing upgrade is quite tricky. Consider the following scenarios:
#* One NN is active and the other is standby - works fine
#* One NN is active and the other is down or all NNs - finalize command throws
exception and the user will not know if it has succeeded or failed and what to
do next
#* No active NN - throws an exception cannot finalize with no active
#* BlockPoolSliceStorage.java change seems unnecessary
# Why is {{throw new AssertionError("Unreachable code.");}} in
QuorumJournalManager.java methods?
# FSImage#doRollBack() - when canRollBack is false after checking if non-share
directories can rollback, an exception must be immediately thrown, instead of
checking shared editlog. Also printing Log.info when storages can be rolled
back will help in debugging.
# FSEditlog#canRollBackSharedLog should accept StorageInfo instead of Storage
# QuorumJournalManager#canRollBack and getJournalCTime can throw AssertionError
(from DFSUtil.assertAllResultsEqual()). Is that the right exception to expose
or IOException?
# Namenode startup throws AssertionError with -rollback option. I think we
should throw IOException, which is how all the other failures are indicated.
> Follow-up to HDFS-5138 to improve error handling during partial upgrade
> failures
> --------------------------------------------------------------------------------
>
> Key: HDFS-5840
> URL: https://issues.apache.org/jira/browse/HDFS-5840
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.0.0
> Reporter: Aaron T. Myers
> Assignee: Aaron T. Myers
> Fix For: 3.0.0
>
>
> Suresh posted some good comment in HDFS-5138 after that patch had already
> been committed to trunk. This JIRA is to address those. See the first comment
> of this JIRA for the full content of the review.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)