[ 
https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140618#comment-17140618
 ] 

Kihwal Lee commented on HDFS-15422:
-----------------------------------

The fix is simple. 
{code}
@@ -2578,10 +2578,7 @@ private BlockInfo processReportedBlock(
         // If the block is an out-of-date generation stamp or state,
         // but we're the standby, we shouldn't treat it as corrupt,
         // but instead just queue it for later processing.
-        // TODO: Pretty confident this should be s/storedBlock/block below,
-        // since we should be postponing the info of the reported block, not
-        // the stored block. See HDFS-6289 for more context.
-        queueReportedBlock(storageInfo, storedBlock, reportedState,
+        queueReportedBlock(storageInfo, block, reportedState,
             QUEUE_REASON_CORRUPT_STATE);
       } else {
         toCorrupt.add(c);
{code}

If  the old information in memory({{storedBlock}}) is used in queueing a 
report, the size may be old.  Unlike GENSTAMP_MISMATCH, this kind of corruption 
can be undone when the NN sees a correct report again. I.e. forcing a block 
report won't fix this condition. 

> Reported IBR is partially replaced with stored info when queuing.
> -----------------------------------------------------------------
>
>                 Key: HDFS-15422
>                 URL: https://issues.apache.org/jira/browse/HDFS-15422
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Kihwal Lee
>            Priority: Critical
>
> When queueing an IBR (incremental block report) on a standby namenode, some 
> of the reported information is being replaced with the existing stored 
> information.  This can lead to false block corruption.
> We had a namenode, after transitioning to active, started reporting missing 
> blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were 
> appended and the sizes were actually correct on the datanodes. Upon further 
> investigation, it was determined that the namenode was queueing IBRs with 
> altered information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to