[
https://issues.apache.org/jira/browse/HDFS-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565884#comment-13565884
]
Suresh Srinivas edited comment on HDFS-4212 at 1/29/13 10:31 PM:
-----------------------------------------------------------------
bq. Brandon. The problem mentioned in your original description seems not to be
a problem at all. Because client never knows whether block was created or not
until it gets a reply from NN. If NN crashes before replying the block will be
correctly reported as missing on restart if it was created. This is the nature
of distributed computing.
Actually it is a problem related to HDFS-4452. When a client does not get
response for getAdditionalBlock(), it retries. As getAdditionalBlock() stands
currently, since it is really not idempotent, new blocks can be allocated. This
causes the issue of namenode reporting corruption for open files. I think
changing getAdditionalBlock and adding an offset as suggested by Brandon will
make it idempotent. On retry, for the same offset, from the same client,
namenode can return the block that has already been allocated, instead of
creating new ones.
was (Author: sureshms):
bq. Brandon. The problem mentioned in your original description seems not
to be a problem at all. Because client never knows whether block was created or
not until it gets a reply from NN. If NN crashes before replying the block will
be correctly reported as missing on restart if it was created. This is the
nature of distributed computing.
Actually it is a problem related to HDFS-4452. When client does not block was
create or not, it retries. As getAdditionalBlock() stands currently, it is
really not idempotent. I think adding getAdditionalBlock at an offset as
suggested by Brandon will make it idempotent. Hence on retry, for the same
offset, from the same client, namenode can return the block that has been
allocated, instead of creating new ones.
> NameNode can't differentiate between a never-created block and a block which
> is really missing
> ----------------------------------------------------------------------------------------------
>
> Key: HDFS-4212
> URL: https://issues.apache.org/jira/browse/HDFS-4212
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 1.2.0, 3.0.0
> Reporter: Brandon Li
> Assignee: Brandon Li
> Attachments: hdfs-4212-junit-test.patch
>
>
> In one test case, NameNode allocated a block and then was killed before the
> client got the addBlock response.
> After NameNode restarted, the block which was never created was considered as
> a missing block and FSCK would report the file is corrupted.
> The problem seems to be that, NameNode can't differentiate between a
> never-created block and a block which is really missing.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira