[ https://issues.apache.org/jira/browse/HDFS-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459935#comment-15459935 ]
Manoj Govindassamy commented on HDFS-10819: ------------------------------------------- [~andrew.wang], Thanks for reviewing the patch. {quote}We need to have a collision between two genstamps of the same block.{quote} More importantly, if the same storage volume in DN happens to hold a block and its various genstamps, then without fix NN will not "store" the blocks with recent/higher genstamps. {quote}Would this also be addressed by having the NN first invalidate the corrupt replica before replicating the correct one{quote} {{BlockManager#markBlockAsCorrupt}} already tries to invalidate the corrupt blocks. But block invalidations are postponed if any of the replica are stale and might not be invalidated for some time and will delay the block reaching to replication factor. {quote}Also curious, would invalidation eventually fix this case, or is it truly stuck? {code} // add block to the datanode AddBlockResult result = storageInfo.addBlock(storedBlock, reportedBlock); if (result == AddBlockResult.ADDED) { .. .. } else if (result == AddBlockResult.REPLACED) { .. .. } else { // if the same block is added again and the replica was corrupt // previously because of a wrong gen stamp, remove it from the // corrupt block list. corruptReplicas.removeFromCorruptReplicasMap(block, node, Reason.GENSTAMP_MISMATCH); curReplicaDelta = 0; blockLog.debug("BLOCK* addStoredBlock: Redundant addStoredBlock request" + " received for {} on node {} size {}", storedBlock, node, storedBlock.getNumBytes()); } {code} As you see above, there is code already in {{BlockManager#addStoredBlock}} to handle the case we are interested in -- Block with latest GS on the same storage volume. Except, the caller {{BlockManager#addStoredBlockUnderConstruction}} is mistakenly skipping the block and not allowing the other module to handle the case properly. Haven't explored the invalidation path fully and not sure if it solve the problem for testRemoveVolumeBeingWrittenForDatanode. Please let me know I need to explore this path. > BlockManager fails to store a good block for a datanode storage after it > reported a corrupt block — block replication stuck > --------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-10819 > URL: https://issues.apache.org/jira/browse/HDFS-10819 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 3.0.0-alpha1 > Reporter: Manoj Govindassamy > Assignee: Manoj Govindassamy > Attachments: HDFS-10819.001.patch > > > TestDataNodeHotSwapVolumes occasionally fails in the unit test > testRemoveVolumeBeingWrittenForDatanode. Data write pipeline can have issues > as there could be timeouts, data node not reachable etc, and in this test > case it was more of induced one as one of the volumes in a datanode is > removed while block write is in progress. Digging further in the logs, when > the problem happens in the write pipeline, the error recovery is not > happening as expected leading to block replication never catching up. > Though this problem has same signature as in HDFS-10780, from the logs it > looks like the code paths taken are totally different and so the root cause > could be different as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org