[ 
https://issues.apache.org/jira/browse/HDFS-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376878#comment-14376878
 ] 

Colin Patrick McCabe commented on HDFS-7960:
--------------------------------------------

bq. Yi wrote: In the patch, rpcsSeen is calculated in NN by counting all rpcs 
of same block report, it's not safe in case of split reports. 
DatanodeProtocol#blockReport is @Idempotent, if retry happens, if (rpcsSeen >= 
context.getTotalRpcs()) can be true, while some datanode storages may not send 
splits of reports, in this case, these datanode storages will be treated as 
zombie and wrongly removed from NN.

Thanks, that's a good point.  We should make sure that these RPCs stay 
idempotent.  I like [~eddyxu]'s solution of using a bitset to track which parts 
were received.

bq. Yi wrote: While removing stored block, we'd better to remove it from 
InvalidateBlocks too.

Very good point.

bq. I attempted to update the patch to address Yi Liu's comments, also fixed 
the test failure TestNNHandlesBlockReportPerStorage.

Thanks, Eddy.

> The full block report should prune zombie storages even if they're not empty
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-7960
>                 URL: https://issues.apache.org/jira/browse/HDFS-7960
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>         Attachments: HDFS-7960.002.patch, HDFS-7960.003.patch, 
> HDFS-7960.004.patch, HDFS-7960.005.patch, HDFS-7960.006.patch, 
> HDFS-7960.007.patch
>
>
> The full block report should prune zombie storages even if they're not empty. 
>  We have seen cases in production where zombie storages have not been pruned 
> subsequent to HDFS-7575.  This could arise any time the NameNode thinks there 
> is a block in some old storage which is actually not there.  In this case, 
> the block will not show up in the "new" storage (once old is renamed to new) 
> and the old storage will linger forever as a zombie, even with the HDFS-7596 
> fix applied.  This also happens with datanode hotplug, when a drive is 
> removed.  In this case, an entire storage (volume) goes away but the blocks 
> do not show up in another storage on the same datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to