[
https://issues.apache.org/jira/browse/HBASE-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033332#comment-16033332
]
Andrew Purtell commented on HBASE-18111:
----------------------------------------
Failure is unrelated. Unless objection I will commit the v2 patch to master and
branch-1 shortly.
> Replication stuck when cluster connection is closed
> ---------------------------------------------------
>
> Key: HBASE-18111
> URL: https://issues.apache.org/jira/browse/HBASE-18111
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 0.98.24, 1.1.10
> Reporter: Guanghao Zhang
> Assignee: Guanghao Zhang
> Attachments: HBASE-18111.patch, HBASE-18111-v1.patch,
> HBASE-18111-v2.patch
>
>
> Log:
> {code}
> 2017-05-24,03:01:25,603 ERROR [regionserver13700-SendThread(hostxxx:11000)]
> org.apache.zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum
> member failed: javax.security.sasl.SaslException: An error:
> (java.security.PrivilegedActionException: javax.security.sasl.SaslException:
> GSS initiate failed [Caused by GSSException: No valid credentials provided
> (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper
> Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED
> state.
> 2017-05-24,03:01:25,615 FATAL [regionserver13700-EventThread]
> org.apache.hadoop.hbase.client.HConnectionImplementation:
> hconnection-0x1148dd9b-0x35b6b4d4ca999c6,
> quorum=10.108.37.30:11000,10.108.38.30:11000,10.108.39.30:11000,10.108.84.25:11000,10.108.84.32:11000,
> baseZNode=/hbase/c3prc-xiaomi98 hconnection-0x1148dd9b-0x35b6b4d4ca999c6
> received auth failed from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode =
> AuthFailed
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:425)
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:333)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-05-24,03:01:25,615 INFO [regionserver13700-EventThread]
> org.apache.hadoop.hbase.client.HConnectionImplementation: Closing zookeeper
> sessionid=0x35b6b4d4ca999c6
> 2017-05-24,03:01:25,623 WARN [regionserver13700.replicationSource,800]
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
> Replicate edites to peer cluster failed.
> java.io.IOException: Call to hostxxx/10.136.22.6:24600 failed on local
> exception: java.io.IOException: Connection closed
> {code}
> jstack
> {code}
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.sleepForRetries(HBaseInterClusterReplicationEndpoint.java:127)
> at
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:199)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:905)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:492)
> {code}
> The cluster connection was aborted when the ZookeeperWatcher receive a
> AuthFailed event. Then the HBaseInterClusterReplicationEndpoint's replicate()
> method will stuck in a while loop.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)