[ https://issues.apache.org/jira/browse/HADOOP-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467205 ]
Konstantin Shvachko commented on HADOOP-702: -------------------------------------------- I'd like to emphasize the changes in the design of the upgrade and the behavior of the system in general. People expressed different opinions during previous discussion so if anybody sees problems with the new approach now would be a good time to speak up. - No FSSIDs means that there will no possibility to create multiple snapshots of the fs. Only one snapshot at any given time. Something like what Dough calls above "filesystem checkpoints" will not be possible any more. - The requirement of exact release version match will result in that there will be no option for administrators to stop the name-node (without stopping data-nodes) and restart it with updated software. Even if no changes to the data layout or data-node protocol have been done. - Another important issue in the new design is that data-nodes will decide on their own whether to upgrade or discard old fs state based on comparison of the local data layout version and the name-node LV. That is, even if you start name-node in regular mode some data-nodes, which missed previous upgrade(s) or discard(s), can decide to do it on their own. I wrote a test that creates hard links of block files in a new directory. On my machine a hard link creation takes about 10 milliseconds, which is 6,000 blocks per minute. Depending on your data-node size you can calculate the cluster startup delay. > DFS Upgrade Proposal > -------------------- > > Key: HADOOP-702 > URL: https://issues.apache.org/jira/browse/HADOOP-702 > Project: Hadoop > Issue Type: New Feature > Components: dfs > Reporter: Konstantin Shvachko > Assigned To: Konstantin Shvachko > Attachments: DFSUpgradeProposal.html, DFSUpgradeProposal2.html, > DFSUpgradeProposal3.html, FSStateTransition.html, TestPlan-HdfsUpgrade.html, > TestPlan-HdfsUpgrade.html > > > Currently the DFS cluster upgrade procedure is manual. > http://wiki.apache.org/lucene-hadoop/Hadoop_Upgrade > It is rather complicated and does not guarantee data recoverability in case > of software errors or administrator mistakes. > This is a description of utilities that make the upgrade process almost > automatic and minimize chance of loosing or corrupting data. > Please see the attached html file for details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.