[
https://issues.apache.org/jira/browse/HADOOP-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590551#action_12590551
]
Konstantin Shvachko commented on HADOOP-2656:
---------------------------------------------
May be this is a good time to return back to the question of renaming hdfs
blocks, stop generating block ids randomly, and replace it with sequentially
generated ids.
This is related to my previous question whether (in the name-node) we need to
store block generation stamp for each block or only for the last block of each
file. The only problem here is with prehistoric (according to Dhruba,
HADOOP-1700, HADOOP-146, HADOOP-158) blocks.
Reminder: a block is *prehistoric* if it is reported to the system after its id
was reassigned to another physical block.
This happens when a data-node is down for a rather long period of time during
which 2 things happen:
# a block that it owns is removed and then
# a new block is created with the same block id.
This leads to data corruption, but this can be avoided if *block ids are
generated sequentially* rather than randomly.
AFAIR, the only reason for not converting to sequential ids was that we were
afraid of getting into a rather massive (distributed) upgrade that renames all
blocks in the system. This patch leads exactly to such an upgrade, and I think
it is the right time to adopt the right id generation practice.
The advantage here is that we will be able to store a (smaller) generation
stamp (if any) per file rather than per block.
> Support for upgrading existing cluster to facilitate appends to HDFS files
> --------------------------------------------------------------------------
>
> Key: HADOOP-2656
> URL: https://issues.apache.org/jira/browse/HADOOP-2656
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Fix For: 0.18.0
>
> Attachments: upgradeGenStamp4.patch, upgradeGenStamp5.patch
>
>
> HADOOP-1700 describes the design for supporting appends to HDFS files. This
> design requires a distributed-upgrade to existing cluster installations. The
> design specifies that the DataNode persist the 8-byte BlockGenerationStamp in
> the block metadata file. The upgrade code will introduce this new field in
> the block metadata file and initialize this value to 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.