[
https://issues.apache.org/jira/browse/HDFS-15187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041212#comment-17041212
]
Íñigo Goiri commented on HDFS-15187:
------------------------------------
Thanks [~ayushtkn] for the patch.
* Can we add a javadoc to processAndHandleReportedBlock explaining what true
and false mean?
* I would like a more descriptive name for "response."
* The code where we do the "continue" is a little hard to follow, what about:
{code}
private void processQueuedMessages(Iterable<ReportedBlockInfo> rbis)
throws IOException {
boolean response = true;
for (ReportedBlockInfo rbi : rbis) {
LOG.debug("Processing previouly queued message {}", rbi);
if (rbi.getReportedState() == null) {
// This is a DELETE_BLOCK request
DatanodeStorageInfo storageInfo = rbi.getStorageInfo();
removeStoredBlock(getStoredBlock(rbi.getBlock()),
storageInfo.getDatanodeDescriptor());
} else if (!response) {
// if the previous IBR processing was skipped, skip processing all
// further IBR's so as to ensure same sequence of processing.
queueReportedBlock(rbi.getStorageInfo(), rbi.getBlock(),
rbi.getReportedState(), QUEUE_REASON_FUTURE_GENSTAMP);
} else {
response = processAndHandleReportedBlock(
rbi.getStorageInfo(), rbi.getBlock(), rbi.getReportedState(), null);
}
}
}
{code}
> CORRUPT replica mismatch between namenodes after failover
> ---------------------------------------------------------
>
> Key: HDFS-15187
> URL: https://issues.apache.org/jira/browse/HDFS-15187
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Ayush Saxena
> Assignee: Ayush Saxena
> Priority: Critical
> Attachments: HDFS-15187-01.patch
>
>
> The corrupt replica identified by Active Namenode, isn't identified by the
> Other Namenode, when it is failovered to Active, in case the replica is being
> marked corrupt due to updatePipeline.
> Scenario to repro :
> 1. Create a file, while writing turn one datanode down, to trigger update
> pipeline.
> 2. Write some more data.
> 3. Close the file.
> 4. Turn on the shutdown datanode.
> 5. The replica in the datanode will be identifed as CORRUPT and the corrupt
> count will be 1.
> 6. Failover to other Namenode.
> 7. Wait for all pending IBR processing.
> 8. The corrupt count will not be same, and the FSCK won't show the corrupt
> replica.
> 9. Failover back to first namenode.
> 10. Corrupt count and corrupt replica will be there.
> Both Namenodes shows different stuff.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]