[
https://issues.apache.org/jira/browse/HDFS-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370336#comment-14370336
]
Colin Patrick McCabe commented on HDFS-7960:
--------------------------------------------
Just a note here: the reason why HDFS-7596 works in the non-hotplug case is
because we make an implicit assumption that a block will be on at most one
storage of a DataNode. So when block reports come in that say that blocks B1,
B2, ... etc. are on some storage, we remove them from any other storages they
might be on. So in the case of HDFS-7575, where we change a storage ID of a
storage, this will gradually allow the zombie storage to go away. Of course,
this doesn't work for datanode hotplug, for the reasons I outlined above. And
even in the case where HDFS-7596 works, it's still relatively slow and could
open up a window when we think something is replicated and it really isn't.
It's better to be clear in the FBR that certain storages just don't exist any
more.
> The full block report should contain a "numStoragesOnDn" field that allows
> the NameNode to prune old storages
> -------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-7960
> URL: https://issues.apache.org/jira/browse/HDFS-7960
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Lei (Eddy) Xu
> Assignee: Colin Patrick McCabe
> Priority: Critical
>
> The full block report should contain a "numStoragesOnDn" field that allows
> the NameNode to prune old storages. Currently, the NameNode can't do this
> pruning because it doesn't know how many storages are on the DataNode, and
> the full block report can be split into multiple RPCs.
> While it is true that storages will be removed by HDFS-7596 when they are
> empty, this mechanism doesn't work for datanode hotplug. In the case of
> datanode hotplug, an entire storage simply goes away because of
> reconfiguration. Currently, the NN may never perceive this "zombie storage"
> as being empty, unless the NN itself is restarted.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)