[ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154933#comment-13154933 ]
nkeywal commented on HBASE-4832: -------------------------------- One little comment: there is a conflict between the timeout on the method (@Test(timeout=timeout)) and the timeout of the sleep (Thread.sleep(timeout)). As they're both set to the same value (30 seconds), it can be one or another so the failure analysis will be more complex. I think we can remove the timeout on the method, the test itself ensures that it won't last forever. > TestRegionServerCoprocessorExceptionWithAbort fails if the region server > stops too fast > --------------------------------------------------------------------------------------- > > Key: HBASE-4832 > URL: https://issues.apache.org/jira/browse/HBASE-4832 > Project: HBase > Issue Type: Bug > Components: coprocessors, test > Affects Versions: 0.94.0 > Reporter: nkeywal > Assignee: Eugene Koontz > Priority: Minor > Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, > HBASE-4832.patch, HBASE-4832.patch > > > The current implementation of HRegionServer#stop is > {noformat} > public void stop(final String msg) { > this.stopped = true; > LOG.info("STOPPED: " + msg); > synchronized (this) { > // Wakes run() if it is sleeping > notifyAll(); // FindBugs NN_NAKED_NOTIFY > } > } > {noformat} > The notification is sent on the wrong object and does nothing. As a > consequence, the region server continues to sleep instead of waking up and > stopping immediately. A correct implementation is: > {noformat} > public void stop(final String msg) { > this.stopped = true; > LOG.info("STOPPED: " + msg); > // Wakes run() if it is sleeping > sleeper.skipSleepCycle(); > } > {noformat} > Then the region server stops immediately. This makes the region server stops > 0,5s faster on average, which is quite useful for unit tests. > However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does > not work. > It likely because the code does no expect the region server to stop that fast. > The exception is: > {noformat} > testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) > Time elapsed: 30.06 sec <<< ERROR! > java.lang.Exception: test timed out after 30000 milliseconds > at java.lang.Throwable.fillInStackTrace(Native Method) > at java.lang.Throwable.<init>(Throwable.java:196) > at java.lang.Exception.<init>(Exception.java:41) > at java.lang.InterruptedException.<init>(InterruptedException.java:48) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) > at > org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) > at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) > at > org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) > at > org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) > at > org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) > at > org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354) > at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892) > at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750) > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725) > at > org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.testExceptionFromCoprocessorDuringPut(TestRegionServerCoprocessorExceptionWithAbort.java:84) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) > {noformat} > We have this exception because we entered a loop of retries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira