[
https://issues.apache.org/jira/browse/HBASE-15856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292481#comment-15292481
]
Gary Helmling commented on HBASE-15856:
---------------------------------------
In {{testFailover()}} the master is failing to start with:
{noformat}
2016-05-19 17:48:53,221 FATAL [localhost:37057.activeMasterManager]
master.HMaster$1(1769): Failed to become active master
java.net.UnknownHostException: 0.example.org
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.<init>(AbstractRpcClient.java:340)
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:271)
at
org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1265)
at
org.apache.hadoop.hbase.client.ConnectionUtils$1.getAdmin(ConnectionUtils.java:135)
at
org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1246)
at
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getCachedConnection(MetaTableLocator.java:387)
at
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaServerConnection(MetaTableLocator.java:368)
at
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.verifyMetaRegionLocation(MetaTableLocator.java:283)
at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:921)
at
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:759)
at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:194)
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1765)
at java.lang.Thread.run(Thread.java:745)
{noformat}
So it seems like the {{MetaTableLocator.verifyMetaRegionLocation()}} call was
not really effective for UnknownHostException prior to this patch, since UHE
wouldn't actually be triggered.
> Cached Connection instances can wind up with addresses never resolved
> ---------------------------------------------------------------------
>
> Key: HBASE-15856
> URL: https://issues.apache.org/jira/browse/HBASE-15856
> Project: HBase
> Issue Type: Bug
> Components: Client
> Reporter: Gary Helmling
> Assignee: Gary Helmling
> Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.2.2
>
> Attachments: HBASE-15856.001.patch, HBASE-15856.002.patch
>
>
> During periods where DNS is not working properly, we can wind up caching
> connections to master or regionservers where the initial hostname resolution
> and the resolution is never re-attempted. This means that clients will
> forever get UnknownHostException for any calls.
> When constructing a BlockingRpcChannelImplementation, we instantiate the
> InetSocketAddress to use for the connection. This instance is then used in
> the rpc client connection, where we check isUnresolved() and throw an
> UnknownHostException if that returns true. However, at this point the rpc
> channel is already cached in the HConnectionImplementation map of stubs. So
> at this point it will never be resolved.
> Setting the config for hbase.resolve.hostnames.on.failure masks this issue,
> since the stub key used is modified to contain the address. However, even in
> that case, if DNS fails, an rpc channel instance with unresolved ISA will
> still be cached in the stubs under the hostname only key.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)