[
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914760#comment-16914760
]
Duo Zhang commented on HDFS-14648:
----------------------------------
We have been using this in our production for a long time. It does solve a big
problem for us in HBase.
You know, for performance, HBase will open a dfs input stream and never close
it unless the file has been compacted away. So when a DN is broken, every dfs
input stream needs to find out the broken DN by its own, since every dfs input
stream manages its own live/dead nodes.
If it is just a process crash, HBase will be fine, since when we touch the dead
DN, we will receive a connection refused immediately and then go to other DNs.
But if the machine is completely down, we will hang there for a long time and
finally receive a connection timeout. Usually the connection timeout will be a
value which is a bit large(15 seconds in our deploy), as there is no way to set
the value per request so we have to find a value which is greater than most of
the timeout values from HBase requests.
This is really a big problem for us. For a 300+ nodes cluster, a machine
failure will make the availability down for more than 2 hours!
So I think this is really a useful feature for HBase.
Thanks.
> DeadNodeDetector basic model
> ----------------------------
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Lisheng Sun
> Assignee: Lisheng Sun
> Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch,
> HDFS-14648.003.patch, HDFS-14648.004.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it
> implements as follow:
> # After DFSInputstream detects some DataNode die, it put in DeadNodeDetector
> and share this information to others in the same DFSClient. The ohter
> DFSInputstreams will not read this DataNode.
> # DeadNodeDetector also have DFSInputstream reference relationships to each
> DataNode. When DFSInputstream close, DeadNodeDetector also remove this
> reference. If some DeadNode of DeadNodeDetector is not read by
> DFSInputstream, it also is removed from DeadNodeDetector.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]