[ 
https://issues.apache.org/jira/browse/HBASE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3065:
-------------------------

    Attachment: 3065-v3.txt

Here is a non-reversed patch with a fix for compile error.

Would you mind taking a looksee Liyin to see why tests are failing?  Here is 
failure from first test in the test suite (mvn clean test):

{code}
 t/s/org.apache.hadoop.hbase.master.TestHMasterRPCException.txt                 
                                                                                
                                                                              
-------------------------------------------------------------------------------
Test set: org.apache.hadoop.hbase.master.TestHMasterRPCException
-------------------------------------------------------------------------------
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.348 sec <<< 
FAILURE!
testRPCException(org.apache.hadoop.hbase.master.TestHMasterRPCException)  Time 
elapsed: 0.312 sec  <<< ERROR!
org.apache.hadoop.hbase.ZooKeeperConnectionException: 
master:57938-0x12fa82ed2230000 Unexpected KeeperException creating base node
    at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:160)
    at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:236)
    at 
org.apache.hadoop.hbase.master.TestHMasterRPCException.testRPCException(TestHMasterRPCException.java:46)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
    at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
    at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
    at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
    at 
org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:62)
    at 
org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:140)
    at 
org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:165)
    at org.apache.maven.surefire.Surefire.run(Surefire.java:107)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at 
org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:289)
    at 
org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1005)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired for /hbase/unassigned
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
    at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:197)
    at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:807)
    at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:155)
    ... 28 more
{code}

> Retry all 'retryable' zk operations; e.g. connection loss
> ---------------------------------------------------------
>
>                 Key: HBASE-3065
>                 URL: https://issues.apache.org/jira/browse/HBASE-3065
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Liyin Tang
>             Fix For: 0.92.0
>
>         Attachments: 3065-v3.txt, HBase-3065[r1088475]_1.patch, 
> hbase3065_2.patch
>
>
> The 'new' master refactored our zk code tidying up all zk accesses and 
> coralling them behind nice zk utility classes.  One improvement was letting 
> out all KeeperExceptions letting the client deal.  Thats good generally 
> because in old days, we'd suppress important state zk changes in state.  But 
> there is at least one case the new zk utility could handle for the 
> application and thats the class of retryable KeeperExceptions.  The one that 
> comes to mind is conection loss.  On connection loss we should retry the 
> just-failed operation.  Usually the retry will just work.  At worse, on 
> reconnect, we'll pick up the expired session event. 
> Adding in this change shouldn't be too bad given the refactor of zk corralled 
> all zk access into one or two classes only.
> One thing to consider though is how much we should retry.  We could retry on 
> a timer or we could retry for ever as long as the Stoppable interface is 
> passed so if another thread has stopped or aborted the hosting service, we'll 
> notice and give up trying.  Doing the latter is probably better than some 
> kinda timeout.
> HBASE-3062 adds a timed retry on the first zk operation.  This issue is about 
> generalizing what is over there across all zk access.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to