[ 
https://issues.apache.org/jira/browse/HBASE-15992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299657#comment-16299657
 ] 

Austin Yan commented on HBASE-15992:
------------------------------------

Hi Mr.Andrew,
I met a very similar issue when I run my program for a while(around 20 days),I 
restarted my program,it gets back to normal,but days later,it got same issue.My 
HBase version is 1.0.2.
When in bad condition,below log is printed serveral time per seconds.
Could you please advise is it same issue to HBase-15992 ?Do we have any patch 
so far?
Thanks.

Below is error log:
[ERROR 2017-12-20 10:07:00] 
{org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:725} - 
hconnection-0x18093947-0x940897f49f808d29, 
quorum=sparkstn03ts:24002,sparkstn02ts:24002,sparkstn01ts:24002, 
baseZNode=/hbase Received unexpected KeeperException, re-throwing 
exceptionorg.apache.zookeeper.KeeperException$AuthFailedException: 
KeeperErrorCode = AuthFailed for /hbase/meta-region-server at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at 
org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1611) at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:360)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:746) at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:482)
 at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:168)
 at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:600)
 at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:580)
 at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:559)
 at 
org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
 at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1188)
 at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1155)
 at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:304)
 at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
 at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61)
 at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
 at 
org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:211)
 at 
org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:185)
 at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1252)
 at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1158)
 at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:304)
 at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
 at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61)
 at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
 at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:323) 
at 
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:298)
 at 
org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
 at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155) 
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:807) at 
com.cmb.zz1.api.zz1hbs.ZZ1HBS_Table.scan(ZZ1HBS_Table.java:448) at 
com.cmb.zz1.api.zz1hbs.ZZ1HBS_Table.scan(ZZ1HBS_Table.java:439) at 
com.cmb.zz1.api.zz1hbs.ZZ1HBS_Table.scan(ZZ1HBS_Table.java:410) at 
com.cmb.adr.wke.adrqimls.ADRQIMLS.rtvImgRcd(ADRQIMLS.java:228) at 
com.cmb.adr.wke.adrqimls.ADRQIMLS.qryImgLst(ADRQIMLS.java:151) at 
com.cmb.adr.wke.adrqimls.ADRQIMLS.execute(ADRQIMLS.java:44) at 
com.cmb.zz1.api.zz1wkm.ZZ1WKM_Monitor.execute(ZZ1WKM_Monitor.java:150) at 
com.cmb.zz1.api.zz1wkm.ZZ1WKM_Monitor.execute(ZZ1WKM_Monitor.java:57) at 
com.cmb.zz1.api.zz1cms.ZZ1CMS_TCPWrkTask.run(ZZ1CMS_TCPWrkTask.java:39) at 
com.cmb.zz1.api.zz1thd.ZZ1THD_ThdExecutor.runWorker(ZZ1THD_ThdExecutor.java:1164)
 at 
com.cmb.zz1.api.zz1thd.ZZ1THD_ThdExecutor$Worker.run(ZZ1THD_ThdExecutor.java:634)
 at java.lang.Thread.run(Thread.java:745)

> Preserve original KeeperException when converted to external exceptions
> -----------------------------------------------------------------------
>
>                 Key: HBASE-15992
>                 URL: https://issues.apache.org/jira/browse/HBASE-15992
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: hbase
>    Affects Versions: 0.98.14
>            Reporter: Hari Krishna Dara
>            Priority: Minor
>              Labels: client, client-auth, zookeeper
>
> During an investigation in which we were seeing unexpected 
> {{NoServerForRegionException}} errors, the root cause turned out to be a 
> {{KeeperException}} that got lost and so resulted in a misleading top level 
> indication.
> The underlying exception with partial stacktrace is this:
> {noformat}
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed for /hbase/meta-region-server
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>       at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1289)
>       at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
>       at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:684)
>       at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.blockUntilAvailable(ZKUtil.java:2032)
>       at 
> org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.blockUntilAvailable(MetaRegionTracker.java:203)
>       at 
> org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:58)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateMeta(HConnectionManager.java:1209)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1175)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1301)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1178)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1135)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:976)
> {noformat}
> Here is some additional information:
> * The exception first gets caught 
> [here|https://github.com/apache/hbase/blob/rel/0.98.14/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L366]
> * It gets logged and rethrown from 
> [here|https://github.com/apache/hbase/blob/rel/0.98.14/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L279]
> * It gets caught again, logged and rethrown 
> [here|https://github.com/apache/hbase/blob/rel/0.98.14/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java#L693]
> * This finally gets caught and rethrown as InterruptedException 
> [here|https://github.com/apache/hbase/blob/rel/0.98.14/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java#L2037]
> When thrown as {{InterruptedException}}, the cause is lost, so [the code 
> catching 
> it|https://github.com/apache/hbase/blob/rel/0.98.14/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ZooKeeperRegistry.java#L65]
>  can't (and currently doesn't) determine the cause. Perhaps the exception 
> should be preserved and passed on to [the 
> caller|https://github.com/apache/hbase/blob/rel/0.98.14/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java#L1312]
>  such that it is available when finally the {{NoServerForRegionException}} is 
> thrown 
> [here|https://github.com/apache/hbase/blob/rel/0.98.14/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java#L1281].
>  Alternatively, a more meaningful exception could also be thrown instead of a 
> misleading {{NoServerForRegionException}}, especially in cases where the 
> failure indicates a more permanent condition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to