[ 
https://issues.apache.org/jira/browse/HBASE-15856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292481#comment-15292481
 ] 

Gary Helmling commented on HBASE-15856:
---------------------------------------

In {{testFailover()}} the master is failing to start with:
{noformat}
2016-05-19 17:48:53,221 FATAL [localhost:37057.activeMasterManager] 
master.HMaster$1(1769): Failed to become active master
java.net.UnknownHostException: 0.example.org
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.<init>(AbstractRpcClient.java:340)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:271)
        at 
org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1265)
        at 
org.apache.hadoop.hbase.client.ConnectionUtils$1.getAdmin(ConnectionUtils.java:135)
        at 
org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1246)
        at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getCachedConnection(MetaTableLocator.java:387)
        at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaServerConnection(MetaTableLocator.java:368)
        at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.verifyMetaRegionLocation(MetaTableLocator.java:283)
        at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:921)
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:759)
        at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:194)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1765)
        at java.lang.Thread.run(Thread.java:745)
{noformat}

So it seems like the {{MetaTableLocator.verifyMetaRegionLocation()}} call was 
not really effective for UnknownHostException prior to this patch, since UHE 
wouldn't actually be triggered.

> Cached Connection instances can wind up with addresses never resolved
> ---------------------------------------------------------------------
>
>                 Key: HBASE-15856
>                 URL: https://issues.apache.org/jira/browse/HBASE-15856
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>            Reporter: Gary Helmling
>            Assignee: Gary Helmling
>            Priority: Critical
>             Fix For: 2.0.0, 1.3.0, 1.2.2
>
>         Attachments: HBASE-15856.001.patch, HBASE-15856.002.patch
>
>
> During periods where DNS is not working properly, we can wind up caching 
> connections to master or regionservers where the initial hostname resolution 
> and the resolution is never re-attempted.  This means that clients will 
> forever get UnknownHostException for any calls.
> When constructing a BlockingRpcChannelImplementation, we instantiate the 
> InetSocketAddress to use for the connection.  This instance is then used in 
> the rpc client connection, where we check isUnresolved() and throw an 
> UnknownHostException if that returns true.  However, at this point the rpc 
> channel is already cached in the HConnectionImplementation map of stubs.  So 
> at this point it will never be resolved.
> Setting the config for hbase.resolve.hostnames.on.failure masks this issue, 
> since the stub key used is modified to contain the address.  However, even in 
> that case, if DNS fails, an rpc channel instance with unresolved ISA will 
> still be cached in the stubs under the hostname only key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to