[jira] [Commented] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades

Konstantin Shvachko (JIRA) Mon, 13 Jan 2014 19:02:47 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870341#comment-13870341
 ]


Konstantin Shvachko commented on HDFS-5535:
-------------------------------------------

Thanks for the design doc, guys. My few questions.
(Quotations from the document are in italic)
# ??The total time required to upgrade a cluster MUST not exceed 
#Nodes_in_cluster * 10 seconds.??
Not sure I understood how you measure the time to upgrade. Administrators 
should be able to spend as much time as they need. On the other hand I can 
write a script that calls upgrade commands in sequence, then push a button and 
the upgrade is done for me.
Just trying to understand the meaning of the requirement.
# ??During upgrade or downgrade, no data loss MUST occur.??
Not clear what this means in case a bug in new software led to a loss of data. 
Probably meant to say that old software should be able to support whatever 
state of the file system left after the upgrade experiment was terminated?
# Does finalize require a checkpoint in the design?
# ??For rollback, NN read editlog in startup as usual. It stops at the marker 
position, writes the fsimage back to disk and then discards the editlog.??
What happens if the edits is corrupted by the new software and the marker is 
not recognizable?
May be it needs to roll edits in some special way to indicate the start of the 
rolling upgrade?
# ??Software version is the version of the running software. In the current 
rolling upgrade mechanism??
What is the current rolling upgrade mechanism? It would make more sense to me 
if word "current" is removed from the above phrase.
# What is MTTR?
# Looks like Lite-Decom and “Optimizing DN Restart time” are competing 
proposals. 
Which one do you actually propose? Sounds like both are still being designed?

The last question is because this seems to be the most intricate part of the 
issue. Conceptually rolling upgrades are possible with a simple patch, which 
eliminates the Software Version verification, plus very careful cluster 
administration, of course. 
And the trick indeed is to avoid client failures so that HBase and other apps 
could run during the upgrade.

> Umbrella jira for improved HDFS rolling upgrades
> ------------------------------------------------
>
>                 Key: HDFS-5535
>                 URL: https://issues.apache.org/jira/browse/HDFS-5535
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, ha, hdfs-client, namenode
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Nathan Roberts
>         Attachments: HDFSRollingUpgradesHighLevelDesign.pdf
>
>
> In order to roll a new HDFS release through a large cluster quickly and 
> safely, a few enhancements are needed in HDFS. An initial High level design 
> document will be attached to this jira, and sub-jiras will itemize the 
> individual tasks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades

Reply via email to