[ 
https://issues.apache.org/jira/browse/HADOOP-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466814
 ] 

Sameer Paranjpye commented on HADOOP-702:
-----------------------------------------

After some discussion with Konstantin, Milind, Owen and Nigel it feels like we 
need some amendments to the design for upgrade and rollbacks. The most 
significant delta is in the area of keeping multiple snapshots with different 
FSSIDs.

The fundamental problem with allowing multiple FSSIDs each representing a 
different filesystem state is that these 'snapshots' decay over time unless 
they are actively managed. There is no monitoring and replication of
blocks in a snapshot. Datanodes going down can cause bit rot and data loss. 
Data corruption also goes undetected since clients never read from snapshots. 
Allowing multiple FSSIDs also causes the number of states the filesystem can be 
in to grow significantly and the number of corner cases that need to be handled 
to explode (particularly on the datanodes). Further, the primary motivation for 
this design is to protect filesystem data in the face of software upgrades and 
rollbacks. Snapshots were a side-effect of the design but they don't feel like 
a hard requirement at this point.

The other important change is much tighter integration of the Namenode and 
Datanodes. The new design requires that the Namenode and Datanodes be running 
the same software version. This is a much stricter requirement than having them 
speaking the same protocol versions. But given that replication and layout can 
change with software revisions it seems reasonable to enforce. Note that this 
does *not* affect HDFS clients, which continue to require protocol 
compatibility only.

Konstantin will be publishing an updated document shortly.



> DFS Upgrade Proposal
> --------------------
>
>                 Key: HADOOP-702
>                 URL: https://issues.apache.org/jira/browse/HADOOP-702
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Konstantin Shvachko
>         Assigned To: Konstantin Shvachko
>         Attachments: DFSUpgradeProposal.html, DFSUpgradeProposal2.html, 
> DFSUpgradeProposal3.html, FSStateTransition.html, TestPlan-HdfsUpgrade.html, 
> TestPlan-HdfsUpgrade.html
>
>
> Currently the DFS cluster upgrade procedure is manual.
> http://wiki.apache.org/lucene-hadoop/Hadoop_Upgrade
> It is rather complicated and does not guarantee data recoverability in case 
> of software errors or administrator mistakes.
> This is a description of utilities that make the upgrade process almost 
> automatic and minimize chance of loosing or corrupting data.
> Please see the attached html file for details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to