[ 
https://issues.apache.org/jira/browse/HADOOP-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590551#action_12590551
 ] 

Konstantin Shvachko commented on HADOOP-2656:
---------------------------------------------

May be this is a good time to return back to the question of renaming hdfs 
blocks, stop generating block ids randomly, and replace it with sequentially 
generated ids.
This is related to my previous question whether (in the name-node) we need to 
store block generation stamp for each block or only for the last block of each 
file. The only problem here is with prehistoric (according to Dhruba, 
HADOOP-1700, HADOOP-146,  HADOOP-158) blocks. 

Reminder: a block is *prehistoric* if it is reported to the system after its id 
was reassigned to another physical block.
This happens when a data-node is down for a rather long period of time during 
which 2 things happen: 
# a block that it owns is removed and then
# a new block is created with the same block id.

This leads to data corruption, but this can be avoided if *block ids are 
generated sequentially* rather than randomly.
AFAIR, the only reason for not converting to sequential ids was that we were 
afraid of getting into a rather massive (distributed) upgrade that renames all 
blocks in the system. This patch leads exactly to such an upgrade, and I think 
it is the right time to adopt the right id generation practice.
The advantage here is that we will be able to store a (smaller) generation 
stamp (if any) per file rather than per block.

> Support for upgrading existing cluster to facilitate appends to HDFS files
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-2656
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2656
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.18.0
>
>         Attachments: upgradeGenStamp4.patch, upgradeGenStamp5.patch
>
>
> HADOOP-1700 describes the design for supporting appends to HDFS files. This 
> design requires a distributed-upgrade to existing cluster installations. The 
> design specifies that the DataNode persist the 8-byte BlockGenerationStamp in 
> the block metadata file. The upgrade code will introduce this new field in 
> the block metadata file and initialize this value to 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to