[
https://issues.apache.org/jira/browse/HDFS-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403180#comment-13403180
]
Tsz Wo (Nicholas), SZE commented on HDFS-3122:
----------------------------------------------
Hi Uma,
I may have missed something: at the time of processing BR, both the block
stored in NN and the replica stored in DN have the newer genstamp. We should
have the following:
- NN side: NN has a newer genstamp but DN reported a replica with an older
genstamp. NN should tell DN to delete replica with the older genstamp and the
stored block in NN remains unchanged.
- DN side: DN receives a replica-delete with the older genstamp from NN but the
stored genstamp is newer. So it ignores the replica-delete.
There may be some bugs in the implementation. What happen after the block is
marked as corrupted? Will DN delete the replica? Or will NN remove the DN
from the block location list?
> Block recovery with closeFile flag true can race with blockReport. Due to
> this blocks are getting marked as corrupt.
> --------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-3122
> URL: https://issues.apache.org/jira/browse/HDFS-3122
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: data-node, name-node
> Affects Versions: 0.23.0, 0.24.0
> Reporter: Uma Maheswara Rao G
> Assignee: Uma Maheswara Rao G
> Priority: Critical
> Attachments: blockCorrupt.txt
>
>
> *Block Report* can *race* with *Block Recovery* with closeFile flag true.
> Block report generated just before block recovery at DN side and due to N/W
> problems, block report got delayed to NN.
> After this, recovery success and generation stamp modifies to new one.
> And primary DN invokes the commitBlockSynchronization and block got updated
> in NN side. Also block got marked as complete, since the closeFile flag was
> true. Updated with new genstamp.
> Now blockReport started processing at NN side. This particular block from RBW
> (when it generated the BR at DN), and file was completed at NN side.
> Finally block will be marked as corrupt because of genstamp mismatch.
> {code}
> case RWR:
> if (!storedBlock.isComplete()) {
> return null; // not corrupt
> } else if (storedBlock.getGenerationStamp() !=
> iblk.getGenerationStamp()) {
> return new BlockToMarkCorrupt(storedBlock,
> "reported " + reportedState + " replica with genstamp " +
> iblk.getGenerationStamp() + " does not match COMPLETE block's " +
> "genstamp in block map " + storedBlock.getGenerationStamp());
> } else { // COMPLETE block, same genstamp
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira