[ 
https://issues.apache.org/jira/browse/HDFS-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459935#comment-15459935
 ] 

Manoj Govindassamy commented on HDFS-10819:
-------------------------------------------

[~andrew.wang],

Thanks for reviewing the patch.

{quote}We need to have a collision between two genstamps of the same 
block.{quote}
More importantly, if the same storage volume in DN happens to hold a block and 
its various genstamps, then without fix NN will not "store" the blocks with 
recent/higher genstamps. 

{quote}Would this also be addressed by having the NN first invalidate the 
corrupt replica before replicating the correct one{quote}
{{BlockManager#markBlockAsCorrupt}} already tries to invalidate the corrupt 
blocks. But block invalidations are postponed if any of the replica are stale  
and might not be invalidated for some time and will delay the block reaching to 
replication factor.

{quote}Also curious, would invalidation eventually fix this case, or is it 
truly stuck?
{code}
    // add block to the datanode
    AddBlockResult result = storageInfo.addBlock(storedBlock, reportedBlock);

    if (result == AddBlockResult.ADDED) {
    .. ..
    } else if (result == AddBlockResult.REPLACED) {
    .. .. 
    } else {
      // if the same block is added again and the replica was corrupt
      // previously because of a wrong gen stamp, remove it from the
      // corrupt block list.
      corruptReplicas.removeFromCorruptReplicasMap(block, node,
          Reason.GENSTAMP_MISMATCH);
      curReplicaDelta = 0;
      blockLog.debug("BLOCK* addStoredBlock: Redundant addStoredBlock request"
              + " received for {} on node {} size {}", storedBlock, node,
          storedBlock.getNumBytes());
    }
{code}

As you see above, there is code already in {{BlockManager#addStoredBlock}} to 
handle the case we are interested in -- Block with latest GS on the same 
storage volume. Except, the caller 
{{BlockManager#addStoredBlockUnderConstruction}} is mistakenly skipping the 
block and not allowing the other module to handle the case properly. Haven't 
explored the invalidation path fully and not sure if it solve the problem for 
testRemoveVolumeBeingWrittenForDatanode. Please let me know I need to explore 
this path.



> BlockManager fails to store a good block for a datanode storage after it 
> reported a corrupt block — block replication stuck
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10819
>                 URL: https://issues.apache.org/jira/browse/HDFS-10819
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-10819.001.patch
>
>
> TestDataNodeHotSwapVolumes occasionally fails in the unit test 
> testRemoveVolumeBeingWrittenForDatanode.  Data write pipeline can have issues 
> as there could be timeouts, data node not reachable etc, and in this test 
> case it was more of induced one as one of the volumes in a datanode is 
> removed while block write is in progress. Digging further in the logs, when 
> the problem happens in the write pipeline, the error recovery is not 
> happening as expected leading to block replication never catching up.
> Though this problem has same signature as in HDFS-10780, from the logs it 
> looks like the code paths taken are totally different and so the root cause 
> could be different as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to