[ 
https://issues.apache.org/jira/browse/HDFS-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897659#comment-13897659
 ] 

Binglin Chang commented on HDFS-5917:
-------------------------------------

Hi Liang Xie, have you watch HDFS-4273?  I used a similar method, the only 
difference is I only expire local node, because when retry connecting to local 
node, it is very fast to dectect failure(no connection timeout), but when you 
expire remote node and try reconnect, if the node is still down, you may wait a 
long time before we can try another live node, when happens, this increases io 
latency a lot.
Another minor comments:  deadNodesRefreshIntervalMs is not necessary and to 
hold a config key, we can always check expiry state in the loop of choosing 
datanode. 

> Have an ability to refresh deadNodes list periodically
> ------------------------------------------------------
>
>                 Key: HDFS-5917
>                 URL: https://issues.apache.org/jira/browse/HDFS-5917
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>         Attachments: HDFS-5917.txt
>
>
> In current HBase + HDFS trunk impl, if one node is added into deadNodes map, 
> before deadNodes.clear() be invoked, this node could not be chosen any more. 
> When i fixed HDFS-5637, i had a raw thought, since there're not a few 
> conditions could trigger a node be added into deadNodes map,  it would be 
> better if we have an ability to refresh this cache map info automaticly. It's 
> good for HBase scenario at least, e.g. before HDFS-5637 fixed, if a local 
> node be added into deadNodes, then it will read remotely even if the local 
> node is live in real:) if more unfortunately, this block is in a huge HFile 
> which doesn't be picked into any minor compaction in short period, the 
> performance penality will be continued until a large compaction or region 
> reopend or deadNodes.clear() be invoked...



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to