[
https://issues.apache.org/jira/browse/HBASE-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221275#comment-14221275
]
Ted Yu commented on HBASE-12554:
--------------------------------
bq. The 60seconds should be configurable
How about introducing a config parameter called
'hbase.ip.to.rack.determiner.timeout' whose unit is milliseconds ?
Do you think 60 seconds are an acceptable default ?
bq. Does the cancel actually interrupt the ongoing lookup or does it leave it
hanging?
The ongoing lookup would be interrupted. See:
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html#cancel(boolean)
bq. Who cares about a lookup in test?
Considering the timeout parameter introduced above, the test can set the
timeout to 10 milliseconds (very low value).
What do you think ?
> TestBaseLoadBalancer may timeout due to lengthy rack lookup
> -----------------------------------------------------------
>
> Key: HBASE-12554
> URL: https://issues.apache.org/jira/browse/HBASE-12554
> Project: HBase
> Issue Type: Test
> Reporter: Ted Yu
> Assignee: Ted Yu
> Attachments: 12554-v1.txt
>
>
> Here is one of the recent occurrences
> (https://builds.apache.org/job/PreCommit-HBASE-Build/11778/console):
> {code}
> testImmediateAssignment(org.apache.hadoop.hbase.master.balancer.TestBaseLoadBalancer)
> Time elapsed: 30.019 sec <<< ERROR!
> java.lang.Exception: test timed out after 30000 milliseconds
> at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
> at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
> at
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
> at java.net.InetAddress.getAllByName0(InetAddress.java:1246)
> at java.net.InetAddress.getAllByName(InetAddress.java:1162)
> at java.net.InetAddress.getAllByName(InetAddress.java:1098)
> at java.net.InetAddress.getByName(InetAddress.java:1048)
> at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:561)
> at org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:578)
> at
> org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:109)
> at
> org.apache.hadoop.hbase.master.RackManager.getRack(RackManager.java:66)
> at
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.<init>(BaseLoadBalancer.java:273)
> at
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.createCluster(BaseLoadBalancer.java:1113)
> at
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.randomAssignment(BaseLoadBalancer.java:1175)
> at
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.immediateAssignment(BaseLoadBalancer.java:1145)
> at
> org.apache.hadoop.hbase.master.balancer.TestBaseLoadBalancer.testImmediateAssignment(TestBaseLoadBalancer.java:136)
> {code}
> One possible fix is to submit CachedDNSToSwitchMapping.resolve() to executor
> pool for execution. RackManager.getRack() can set a timeout beyond which
> UNKNOWN_RACK is returned.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)