[
https://issues.apache.org/jira/browse/HADOOP-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466814
]
Sameer Paranjpye commented on HADOOP-702:
-----------------------------------------
After some discussion with Konstantin, Milind, Owen and Nigel it feels like we
need some amendments to the design for upgrade and rollbacks. The most
significant delta is in the area of keeping multiple snapshots with different
FSSIDs.
The fundamental problem with allowing multiple FSSIDs each representing a
different filesystem state is that these 'snapshots' decay over time unless
they are actively managed. There is no monitoring and replication of
blocks in a snapshot. Datanodes going down can cause bit rot and data loss.
Data corruption also goes undetected since clients never read from snapshots.
Allowing multiple FSSIDs also causes the number of states the filesystem can be
in to grow significantly and the number of corner cases that need to be handled
to explode (particularly on the datanodes). Further, the primary motivation for
this design is to protect filesystem data in the face of software upgrades and
rollbacks. Snapshots were a side-effect of the design but they don't feel like
a hard requirement at this point.
The other important change is much tighter integration of the Namenode and
Datanodes. The new design requires that the Namenode and Datanodes be running
the same software version. This is a much stricter requirement than having them
speaking the same protocol versions. But given that replication and layout can
change with software revisions it seems reasonable to enforce. Note that this
does *not* affect HDFS clients, which continue to require protocol
compatibility only.
Konstantin will be publishing an updated document shortly.
> DFS Upgrade Proposal
> --------------------
>
> Key: HADOOP-702
> URL: https://issues.apache.org/jira/browse/HADOOP-702
> Project: Hadoop
> Issue Type: New Feature
> Components: dfs
> Reporter: Konstantin Shvachko
> Assigned To: Konstantin Shvachko
> Attachments: DFSUpgradeProposal.html, DFSUpgradeProposal2.html,
> DFSUpgradeProposal3.html, FSStateTransition.html, TestPlan-HdfsUpgrade.html,
> TestPlan-HdfsUpgrade.html
>
>
> Currently the DFS cluster upgrade procedure is manual.
> http://wiki.apache.org/lucene-hadoop/Hadoop_Upgrade
> It is rather complicated and does not guarantee data recoverability in case
> of software errors or administrator mistakes.
> This is a description of utilities that make the upgrade process almost
> automatic and minimize chance of loosing or corrupting data.
> Please see the attached html file for details.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.