[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914760#comment-16914760
 ] 

Duo Zhang commented on HDFS-14648:
----------------------------------

We have been using this in our production for a long time. It does solve a big 
problem for us in HBase.

You know, for performance, HBase will open a dfs input stream and never close 
it unless the file has been compacted away. So when a DN is broken, every dfs 
input stream needs to find out the broken DN by its own, since every dfs input 
stream manages its own live/dead nodes.

If it is just a process crash, HBase will be fine, since when we touch the dead 
DN, we will receive a connection refused immediately and then go to other DNs. 
But if the machine is completely down, we will hang there for a long time and 
finally receive a connection timeout. Usually the connection timeout will be a 
value which is a bit large(15 seconds in our deploy), as there is no way to set 
the value per request so we have to find a value which is greater than most of 
the timeout values from HBase requests. 

This is really a big problem for us. For a 300+ nodes cluster, a machine 
failure will make the availability down for more than 2 hours!

So I think this is really a useful feature for HBase.

Thanks.

> DeadNodeDetector basic model
> ----------------------------
>
>                 Key: HDFS-14648
>                 URL: https://issues.apache.org/jira/browse/HDFS-14648
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Lisheng Sun
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # After DFSInputstream detects some DataNode die, it put in DeadNodeDetector 
> and share this information to others in the same DFSClient. The ohter 
> DFSInputstreams will not read this DataNode.
>  # DeadNodeDetector also have DFSInputstream reference relationships to each 
> DataNode. When DFSInputstream close, DeadNodeDetector also remove this 
> reference. If some DeadNode of DeadNodeDetector is not read by 
> DFSInputstream, it also is removed from DeadNodeDetector.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to