[
https://issues.apache.org/jira/browse/HDFS-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17410148#comment-17410148
]
Xiaoqiao He commented on HDFS-16200:
------------------------------------
Thanks [~aihuaxu] for your report. I think the best way is to improve resolve
and rack ware performance rather than disable it directly. FYI.
> Improve NameNode failover
> -------------------------
>
> Key: HDFS-16200
> URL: https://issues.apache.org/jira/browse/HDFS-16200
> Project: Hadoop HDFS
> Issue Type: Task
> Components: namanode
> Affects Versions: 2.8.2
> Reporter: Aihua Xu
> Assignee: Aihua Xu
> Priority: Major
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> In a busy cluster, we are noticing the NameNode failover takes longer time
> (over 10 minutes) and it causes cluster down time during the time period.
> One bottleneck locates in resolving the client host's topology when the
> cluster is not colocated with the computing hosts. NameNode resolves the
> client host's topology and uses it to sort the hosts where the blocks locate
> in. Such topology will be cached so the next access will be efficient, while
> if the standby NameNode is newly restarted, then all the client hosts, e.g.,
> YARN hosts need to be resolved.
> Solutions can be: 1) we can expose an API in DFSAdmin to load topology cache,
> or 2) we can add a new configuration in HDFS cluster to skip resolving
> topology for non-colocated HDFS cluster. Since client hosts and HDFS hosts
> are not colocated, it's unnecessary to sort the DataNodes for the clients.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]