[ 
https://issues.apache.org/jira/browse/HDFS-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541061#comment-14541061
 ] 

Colin Patrick McCabe commented on HDFS-8380:
--------------------------------------------

Background: HDFS-6830 attempted to implement "block shifting logic," whereby 
when the NameNode received a report about some replica saying it was in some 
DataNode storage, it would update the NN's internal data structures to reflect 
the fact that this replica was not in any other storages on that DataNode.  The 
assumption was (and still is) that each replica is present in at most one 
storage on each DN (an assumption we might want to revisit at some point, but 
that's outside the scope of this JIRA...).

HDFS-6830 was flawed, however.  Although it changed {{BlockManager#addBlock}} 
to update the storage which a particular block was in, it would not actually 
call {{BlockManager#addBlock}} on blocks it received in the full block report, 
if it had already seen their IDs.  So in the case where blocks were moved 
between storages, HDFS-6830 would not actually update the internal data 
structures on the NameNode... they would remain in the old storages.

HDFS-6991, although it would appear to be unrelated based on the title, 
actually has a partial fix for the bug in HDFS-6830, in the form of this code:

{code}
-        && (!storedBlock.findDatanode(dn)
-        || corruptReplicas.isReplicaCorrupt(storedBlock, dn))) {
+        && (storedBlock.findStorageInfo(storageInfo) == -1 ||
+            corruptReplicas.isReplicaCorrupt(storedBlock, dn))) {
                  addBlock(...)
{code}

However, HDFS-6991 doesn't fix the issue for RBW blocks.  Admittedly, it is 
much less likely for RBW blocks to be shifted between storages, because when 
restarting a datanode, the RBW replicas become RWR.  However, for the sake of 
robustness, we should implement the shifting behavior there too.

This patch does that.  It also adds logging for the first time we receive a 
storage report for a given storage.  This should happen only once per storage, 
so it won't generate too many logs.  It will be useful for tracing what is 
going on.  It also adds debug logs to the initial storage report, similar to 
the debug logs available for the non-initial storage report.  Finally, it adds 
a unit test for the shifting behavior.  The unit test tests shifting of 
finalized blocks rather than RBW ones, so it doesn't require the rest of the 
patch to pass, but it's still very useful for preventing regressions.

> Always call addStoredBlock on blocks which have been shifted from one storage 
> to another
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-8380
>                 URL: https://issues.apache.org/jira/browse/HDFS-8380
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-8380.001.patch
>
>
> We should always call addStoredBlock on blocks which have been shifted from 
> one storage to another.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to