[ 
https://issues.apache.org/jira/browse/HDFS-13571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884704#comment-16884704
 ] 

Yiqun Lin commented on HDFS-13571:
----------------------------------

Thanks for the summary, [~leosun08]. Design almost looks good to me except 
following one:
{quote}When an InputStream is opened, a BlockReader is opened, and the DataNode 
involved in the Block is added to the Live Node list that DeadNodeDetector will 
periodically detect the list.If it is found to be inaccessible, put the 
DataNode into the Dead Node....
{quote}
I don't think it's an effective way to always add and detect the liveness for 
all opened blocks's DataNodes. DeadNodeDetector will quickly reach the maximum 
size of live nodes once InputStreams open many blocks and opened for the long 
time. Then DeadNodeDetector will do many unnecessary liveness check since the 
dead, suspicious nodes should only be a very small part of them. Instead of, 
the live nodes can be used for check the liveness for some live nodes that is 
transformed from dead or suspicious nodes. 

Others looks good  to me, :).

> Dead DataNode Detector
> ----------------------
>
>                 Key: HDFS-13571
>                 URL: https://issues.apache.org/jira/browse/HDFS-13571
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 2.4.0, 2.6.0, 3.0.2
>            Reporter: Gang Xie
>            Assignee: Lisheng Sun
>            Priority: Minor
>         Attachments: HDFS-13571-2.6.diff, node status machine.png
>
>
> Currently, the information of the dead datanode in DFSInputStream in stored 
> locally. So, it could not be shared among the inputstreams of the same 
> DFSClient. In our production env, every days, some datanodes dies with 
> different causes. At this time, after the first inputstream blocked and 
> detect this, it could share this information to others in the same DFSClient, 
> thus, the ohter inputstreams are still blocked by the dead node for some 
> time, which could cause bad service latency.
> To eliminate this impact from dead datanode, we designed a dead datanode 
> detector, which detect the dead ones in advance, and share this information 
> among all the inputstreams in the same client. This improvement has being 
> online for some months and works fine.  So, we decide to port to the 3.0 (the 
> version used in our production env is 2.4 and 2.6).
> I will do the porting work and upload the code later.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to