[
https://issues.apache.org/jira/browse/HBASE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-3065:
-------------------------
Attachment: 3065-v3.txt
Here is a non-reversed patch with a fix for compile error.
Would you mind taking a looksee Liyin to see why tests are failing? Here is
failure from first test in the test suite (mvn clean test):
{code}
t/s/org.apache.hadoop.hbase.master.TestHMasterRPCException.txt
-------------------------------------------------------------------------------
Test set: org.apache.hadoop.hbase.master.TestHMasterRPCException
-------------------------------------------------------------------------------
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.348 sec <<<
FAILURE!
testRPCException(org.apache.hadoop.hbase.master.TestHMasterRPCException) Time
elapsed: 0.312 sec <<< ERROR!
org.apache.hadoop.hbase.ZooKeeperConnectionException:
master:57938-0x12fa82ed2230000 Unexpected KeeperException creating base node
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:160)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:236)
at
org.apache.hadoop.hbase.master.TestHMasterRPCException.testRPCException(TestHMasterRPCException.java:46)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at
org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:62)
at
org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:140)
at
org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:165)
at org.apache.maven.surefire.Surefire.run(Surefire.java:107)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:289)
at
org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1005)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase/unassigned
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
at
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:197)
at
org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:807)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:155)
... 28 more
{code}
> Retry all 'retryable' zk operations; e.g. connection loss
> ---------------------------------------------------------
>
> Key: HBASE-3065
> URL: https://issues.apache.org/jira/browse/HBASE-3065
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Liyin Tang
> Fix For: 0.92.0
>
> Attachments: 3065-v3.txt, HBase-3065[r1088475]_1.patch,
> hbase3065_2.patch
>
>
> The 'new' master refactored our zk code tidying up all zk accesses and
> coralling them behind nice zk utility classes. One improvement was letting
> out all KeeperExceptions letting the client deal. Thats good generally
> because in old days, we'd suppress important state zk changes in state. But
> there is at least one case the new zk utility could handle for the
> application and thats the class of retryable KeeperExceptions. The one that
> comes to mind is conection loss. On connection loss we should retry the
> just-failed operation. Usually the retry will just work. At worse, on
> reconnect, we'll pick up the expired session event.
> Adding in this change shouldn't be too bad given the refactor of zk corralled
> all zk access into one or two classes only.
> One thing to consider though is how much we should retry. We could retry on
> a timer or we could retry for ever as long as the Stoppable interface is
> passed so if another thread has stopped or aborted the hosting service, we'll
> notice and give up trying. Doing the latter is probably better than some
> kinda timeout.
> HBASE-3062 adds a timed retry on the first zk operation. This issue is about
> generalizing what is over there across all zk access.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira