[
https://issues.apache.org/jira/browse/HDFS-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898645#comment-13898645
]
Liang Xie commented on HDFS-5917:
---------------------------------
[~decster], thanks for your comments! yeh, i got your concern absolutely, my
understanding is:
1) we need the deadNodesRefreshIntervalMs, since we don't know the deadNodes
size, we could not always assume it's only have one or two entries, right?
because probably end user is able to specify the repl factor to a bigger value
than the default 3. anyway the deadNodesRefreshIntervalMs parameter just a
shortcut optimization tip.
2) "if the node is still down, you may wait a long time before we can try
another live node, when happens, this increases io latency a lot", in current
trunk code, we have some configurable parameter to control the retry caused
latency, right ? :)
> Have an ability to refresh deadNodes list periodically
> ------------------------------------------------------
>
> Key: HDFS-5917
> URL: https://issues.apache.org/jira/browse/HDFS-5917
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.0.0, 2.2.0
> Reporter: Liang Xie
> Assignee: Liang Xie
> Attachments: HDFS-5917.txt
>
>
> In current HBase + HDFS trunk impl, if one node is added into deadNodes map,
> before deadNodes.clear() be invoked, this node could not be chosen any more.
> When i fixed HDFS-5637, i had a raw thought, since there're not a few
> conditions could trigger a node be added into deadNodes map, it would be
> better if we have an ability to refresh this cache map info automaticly. It's
> good for HBase scenario at least, e.g. before HDFS-5637 fixed, if a local
> node be added into deadNodes, then it will read remotely even if the local
> node is live in real:) if more unfortunately, this block is in a huge HFile
> which doesn't be picked into any minor compaction in short period, the
> performance penality will be continued until a large compaction or region
> reopend or deadNodes.clear() be invoked...
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)