[
https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870341#comment-13870341
]
Konstantin Shvachko commented on HDFS-5535:
-------------------------------------------
Thanks for the design doc, guys. My few questions.
(Quotations from the document are in italic)
# ??The total time required to upgrade a cluster MUST not exceed
#Nodes_in_cluster * 10 seconds.??
Not sure I understood how you measure the time to upgrade. Administrators
should be able to spend as much time as they need. On the other hand I can
write a script that calls upgrade commands in sequence, then push a button and
the upgrade is done for me.
Just trying to understand the meaning of the requirement.
# ??During upgrade or downgrade, no data loss MUST occur.??
Not clear what this means in case a bug in new software led to a loss of data.
Probably meant to say that old software should be able to support whatever
state of the file system left after the upgrade experiment was terminated?
# Does finalize require a checkpoint in the design?
# ??For rollback, NN read editlog in startup as usual. It stops at the marker
position, writes the fsimage back to disk and then discards the editlog.??
What happens if the edits is corrupted by the new software and the marker is
not recognizable?
May be it needs to roll edits in some special way to indicate the start of the
rolling upgrade?
# ??Software version is the version of the running software. In the current
rolling upgrade mechanism??
What is the current rolling upgrade mechanism? It would make more sense to me
if word "current" is removed from the above phrase.
# What is MTTR?
# Looks like Lite-Decom and “Optimizing DN Restart time” are competing
proposals.
Which one do you actually propose? Sounds like both are still being designed?
The last question is because this seems to be the most intricate part of the
issue. Conceptually rolling upgrades are possible with a simple patch, which
eliminates the Software Version verification, plus very careful cluster
administration, of course.
And the trick indeed is to avoid client failures so that HBase and other apps
could run during the upgrade.
> Umbrella jira for improved HDFS rolling upgrades
> ------------------------------------------------
>
> Key: HDFS-5535
> URL: https://issues.apache.org/jira/browse/HDFS-5535
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, ha, hdfs-client, namenode
> Affects Versions: 3.0.0, 2.2.0
> Reporter: Nathan Roberts
> Attachments: HDFSRollingUpgradesHighLevelDesign.pdf
>
>
> In order to roll a new HDFS release through a large cluster quickly and
> safely, a few enhancements are needed in HDFS. An initial High level design
> document will be attached to this jira, and sub-jiras will itemize the
> individual tasks.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)