[jira] [Commented] (HDFS-15187) CORRUPT replica mismatch between namenodes after failover

Jira Thu, 20 Feb 2020 10:22:04 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-15187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041212#comment-17041212
 ]


Íñigo Goiri commented on HDFS-15187:
------------------------------------

Thanks [~ayushtkn] for the patch.
* Can we add a javadoc to processAndHandleReportedBlock explaining what true 
and false mean?
* I would like a more descriptive name for "response."
* The code where we do the "continue" is a little hard to follow, what about:
{code}
  private void processQueuedMessages(Iterable<ReportedBlockInfo> rbis)
      throws IOException {
    boolean response = true;
    for (ReportedBlockInfo rbi : rbis) {
      LOG.debug("Processing previouly queued message {}", rbi);
      if (rbi.getReportedState() == null) {
        // This is a DELETE_BLOCK request
        DatanodeStorageInfo storageInfo = rbi.getStorageInfo();
        removeStoredBlock(getStoredBlock(rbi.getBlock()),
            storageInfo.getDatanodeDescriptor());
      } else if (!response) {
        // if the previous IBR processing was skipped, skip processing all
        // further IBR's so as to ensure same sequence of processing.
       queueReportedBlock(rbi.getStorageInfo(), rbi.getBlock(),
             rbi.getReportedState(), QUEUE_REASON_FUTURE_GENSTAMP);
      } else {
        response = processAndHandleReportedBlock(
            rbi.getStorageInfo(), rbi.getBlock(), rbi.getReportedState(), null);
      }
    }
  }
{code}



> CORRUPT replica mismatch between namenodes after failover
> ---------------------------------------------------------
>
>                 Key: HDFS-15187
>                 URL: https://issues.apache.org/jira/browse/HDFS-15187
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Critical
>         Attachments: HDFS-15187-01.patch
>
>
> The corrupt replica identified by Active Namenode, isn't identified by the 
> Other Namenode, when it is failovered to Active, in case the replica is being 
> marked corrupt due to updatePipeline.
> Scenario to repro :
> 1. Create a file, while writing turn one datanode down, to trigger update 
> pipeline.
> 2. Write some more data.
> 3. Close the file.
> 4. Turn on the shutdown datanode.
> 5. The replica in the datanode will be identifed as CORRUPT and the corrupt 
> count will be 1.
> 6. Failover to other Namenode.
> 7. Wait for all pending IBR processing.
> 8. The corrupt count will not be same, and the FSCK won't show the corrupt 
> replica.
> 9. Failover back to first namenode.
> 10. Corrupt count and corrupt replica will be there.
> Both Namenodes shows different stuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-15187) CORRUPT replica mismatch between namenodes after failover

Reply via email to