[
https://issues.apache.org/jira/browse/HDFS-15605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216155#comment-17216155
]
Jinglun commented on HDFS-15605:
--------------------------------
Hi [~ayushtkn], thanks your nice comments ! If I understand correctly, your
suggestion is to add some if-else to let the DeadNodeDetector has different
behaviors. And the purpose is to keep the main structure unchanged for better
stability and compatibility.
When I first started working on this I did think about using some if-else to
let the DeadNodeDetector updating deadnodes from the NameNode. Finally I chosen
the current way beacuse:
1. To preserve the basic structure of the DeadNodeDetector. Adding the logic
of InServiceDetector with many if-else conditions would make the
DeadNodeDetector logic not clear and even harder to be understood. The
DeadNodeDetector maintains many states, sets and threads. But if we choose to
update deadnodes from the NameNode then all these states and threads are
unrelated and I'm afraid it would need many if-else conditions.
2. Make the DeadNodeDetector flexible. Like your suggestion in the future we
might consider adding a new Detector which detects deadnodes by fetching the
block locations. So I think using an Abstract class might be a good choice.
{quote}A point to note is getDatanodeReport is a very heavy call, Refetching
block locations again might be cheaper in some cases.
{quote}
Thanks your reminding ! Yes this is very important. For me the cost is ok
because the dead node detector is only used for hbase. The cluster is always
under 100 nodes. The update interval is 10min so I think it is fine for the
NameNode.
Shall I split this into 2 steps: first implement the abstract class of
DeadNodeDetector, then add the new InServiceDetector to it. The current patch
is a little big.
[~ayushtkn] Please correct me if I make anything wrong. Hope your further
suggestions ! Hi [~leosun08], do you have time for this. Looking forward to
your comments !
> DeadNodeDetector supports getting deadnode from NameNode.
> ---------------------------------------------------------
>
> Key: HDFS-15605
> URL: https://issues.apache.org/jira/browse/HDFS-15605
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Jinglun
> Assignee: Jinglun
> Priority: Major
> Attachments: HDFS-15605.001.patch, HDFS-15605.002.patch,
> HDFS-15605.003.patch
>
>
> When we are using DeadNodeDetector, sometimes it marks too many nodes as dead
> and cause the read failures. The DeadNodeDetector assumes all the
> getDatanodeInfo rpcs failed to return in time are dead nodes. But actually
> not. A client side error or a slow rpc in DataNode might be marked as dead
> too. For example the client side delay of the rpcThreadPool might cause the
> getDatanodeInfo rpcs timeout and adding many datanodes to the dead list.
> We have a simple improvement for this: the NameNode already knows which
> datanodes are dead. So just update the dead list from NameNode using
> DFSClient.datanodeReport().
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]