[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383413#comment-15383413
 ] 

Colin P. McCabe commented on HDFS-10301:
----------------------------------------

--- 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
{code}
@@ -308,10 +308,10 @@ public synchronized boolean checkLease(DatanodeDescriptor 
dn,
       return false;
     }
     if (node.leaseId == 0) {
-      LOG.warn("BR lease 0x{} is not valid for DN {}, because the DN " +
-               "is not in the pending set.",
-               Long.toHexString(id), dn.getDatanodeUuid());
-      return false;
+      LOG.debug("DN {} is not in the pending set because BR with "
+              + "lease 0x{} was processed out of order",
+          dn.getDatanodeUuid(), Long.toHexString(id));
+      return true;
     }
{code}

There are other reasons why {{node.leaseId}} might be 0, besides block reports 
getting processed out of order.  For example, an RPC could have gotten 
duplicated by something in the network.  Let's not change the existing error 
message.

{code}
            StorageBlockReport[] lastSplitReport =
                new StorageBlockReport[perVolumeBlockLists.size()];
            // When block reports are split, the last RPC in the block report
            // has the information about all storages in the block report.
            // See HDFS-10301 for more details. To achieve this, the last RPC
            // has 'n' storage reports, where 'n' is the number of storages in
            // a DN. The actual block replicas are reported only for the
            // last/n-th storage.
{code}
Why do we have to use such a complex and confusing approach?  Like I commented 
earlier, a report of the existing storages is not the same as a block report.  
Why are we creating {{BlockListAsLongs}} objects that aren't lists of blocks?

There is a much simpler approach, which is just adding a list of storage IDs to 
the block report RPC by making a backwards-compatible protobuf change.  It's 
really easy:

{code}
+repeated String allStorageIds = 8;
{code}

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10301
>                 URL: https://issues.apache.org/jira/browse/HDFS-10301
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.1
>            Reporter: Konstantin Shvachko
>            Assignee: Vinitha Reddy Gankidi
>            Priority: Critical
>         Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.sample.patch, 
> zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to