[ https://issues.apache.org/jira/browse/HADOOP-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466814 ]
Sameer Paranjpye commented on HADOOP-702: ----------------------------------------- After some discussion with Konstantin, Milind, Owen and Nigel it feels like we need some amendments to the design for upgrade and rollbacks. The most significant delta is in the area of keeping multiple snapshots with different FSSIDs. The fundamental problem with allowing multiple FSSIDs each representing a different filesystem state is that these 'snapshots' decay over time unless they are actively managed. There is no monitoring and replication of blocks in a snapshot. Datanodes going down can cause bit rot and data loss. Data corruption also goes undetected since clients never read from snapshots. Allowing multiple FSSIDs also causes the number of states the filesystem can be in to grow significantly and the number of corner cases that need to be handled to explode (particularly on the datanodes). Further, the primary motivation for this design is to protect filesystem data in the face of software upgrades and rollbacks. Snapshots were a side-effect of the design but they don't feel like a hard requirement at this point. The other important change is much tighter integration of the Namenode and Datanodes. The new design requires that the Namenode and Datanodes be running the same software version. This is a much stricter requirement than having them speaking the same protocol versions. But given that replication and layout can change with software revisions it seems reasonable to enforce. Note that this does *not* affect HDFS clients, which continue to require protocol compatibility only. Konstantin will be publishing an updated document shortly. > DFS Upgrade Proposal > -------------------- > > Key: HADOOP-702 > URL: https://issues.apache.org/jira/browse/HADOOP-702 > Project: Hadoop > Issue Type: New Feature > Components: dfs > Reporter: Konstantin Shvachko > Assigned To: Konstantin Shvachko > Attachments: DFSUpgradeProposal.html, DFSUpgradeProposal2.html, > DFSUpgradeProposal3.html, FSStateTransition.html, TestPlan-HdfsUpgrade.html, > TestPlan-HdfsUpgrade.html > > > Currently the DFS cluster upgrade procedure is manual. > http://wiki.apache.org/lucene-hadoop/Hadoop_Upgrade > It is rather complicated and does not guarantee data recoverability in case > of software errors or administrator mistakes. > This is a description of utilities that make the upgrade process almost > automatic and minimize chance of loosing or corrupting data. > Please see the attached html file for details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.