[ 
https://issues.apache.org/jira/browse/HBASE-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221275#comment-14221275
 ] 

Ted Yu commented on HBASE-12554:
--------------------------------

bq. The 60seconds should be configurable
How about introducing a config parameter called 
'hbase.ip.to.rack.determiner.timeout' whose unit is milliseconds ?
Do you think 60 seconds are an acceptable default ?

bq. Does the cancel actually interrupt the ongoing lookup or does it leave it 
hanging?
The ongoing lookup would be interrupted. See:
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html#cancel(boolean)

bq. Who cares about a lookup in test?
Considering the timeout parameter introduced above, the test can set the 
timeout to 10 milliseconds (very low value).
What do you think ?

> TestBaseLoadBalancer may timeout due to lengthy rack lookup
> -----------------------------------------------------------
>
>                 Key: HBASE-12554
>                 URL: https://issues.apache.org/jira/browse/HBASE-12554
>             Project: HBase
>          Issue Type: Test
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: 12554-v1.txt
>
>
> Here is one of the recent occurrences 
> (https://builds.apache.org/job/PreCommit-HBASE-Build/11778/console):
> {code}
> testImmediateAssignment(org.apache.hadoop.hbase.master.balancer.TestBaseLoadBalancer)
>   Time elapsed: 30.019 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30000 milliseconds
>       at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
>       at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
>       at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
>       at java.net.InetAddress.getAllByName0(InetAddress.java:1246)
>       at java.net.InetAddress.getAllByName(InetAddress.java:1162)
>       at java.net.InetAddress.getAllByName(InetAddress.java:1098)
>       at java.net.InetAddress.getByName(InetAddress.java:1048)
>       at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:561)
>       at org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:578)
>       at 
> org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:109)
>       at 
> org.apache.hadoop.hbase.master.RackManager.getRack(RackManager.java:66)
>       at 
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.<init>(BaseLoadBalancer.java:273)
>       at 
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.createCluster(BaseLoadBalancer.java:1113)
>       at 
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.randomAssignment(BaseLoadBalancer.java:1175)
>       at 
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.immediateAssignment(BaseLoadBalancer.java:1145)
>       at 
> org.apache.hadoop.hbase.master.balancer.TestBaseLoadBalancer.testImmediateAssignment(TestBaseLoadBalancer.java:136)
> {code}
> One possible fix is to submit CachedDNSToSwitchMapping.resolve() to executor 
> pool for execution. RackManager.getRack() can set a timeout beyond which 
> UNKNOWN_RACK is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to