[ 
https://issues.apache.org/jira/browse/HBASE-9712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189035#comment-14189035
 ] 

Dima Spivak commented on HBASE-9712:
------------------------------------

Some more debugging: all recent failures are caused by testLogFilesAreArchived 
(compare failure history of 
[testLogFilesAreArchived|https://builds.apache.org/job/HBase-0.98/lastSuccessfulBuild/testReport/junit/org.apache.hadoop.hbase.master/TestSplitLogManager/testLogFilesAreArchived/history/]
 and [TestSplitLogManager 
itself|https://builds.apache.org/job/HBase-0.98/lastSuccessfulBuild/testReport/org.apache.hadoop.hbase.master/TestSplitLogManager/history/]).

Test output from a failed run suggests this may be legit. [Some output from a 
failed 
run|https://builds.apache.org/job/HBase-0.98/635/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.master.TestSplitLogManager-output.txt]
 suggests that things start going south after WAL splitting begins:
{code}
2014-10-29 02:45:30,670 DEBUG [Thread-27] master.SplitLogManager(323): 
Scheduling batch of logs to split
2014-10-29 02:45:30,670 INFO  [Thread-27] master.SplitLogManager(325): started 
splitting 1 logs in 
[/home/jenkins/jenkins-slave/workspace/HBase-0.98/hbase-server/target/test-data/5d71199c-6df9-4aa6-800f-9634b546ecef/testLogFilesAreArchived/595bcd1c-d510-464c-8207-33e8580aee96]
2014-10-29 02:45:30,672 WARN  [Thread-31] master.TestSplitLogManager$5(527): 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /hbase/splitWAL/595bcd1c-d510-464c-8207-33e8580aee96%2Ffoo%2C1%2C1
2014-10-29 02:45:30,672 DEBUG [pool-1-thread-1-EventThread] 
master.SplitLogManager(703): put up splitlog task at znode 
/hbase/splitWAL/595bcd1c-d510-464c-8207-33e8580aee96%2Ffoo%2C1%2C1
2014-10-29 02:45:30,674 DEBUG [pool-1-thread-1-EventThread] 
master.SplitLogManager(745): task not yet acquired 
/hbase/splitWAL/595bcd1c-d510-464c-8207-33e8580aee96%2Ffoo%2C1%2C1 ver = 0
{code}

For splits that occur successfully, the task is acquired quickly and completed 
within a few tens of milliseconds. For unsuccessful runs, the unassigned task 
looks to be resubmitted repeatedly by splitLogManagerTimeoutMonitor, but while 
tasks like /hbase/splitWAL/RESCAN0000000001 enter into a DONE state, the 
splitlog one never does and times out after 60 s. Can someone more familiar 
with the SplitLogManager is doing here chime in?

> TestSplitLogManager still fails on occasion
> -------------------------------------------
>
>                 Key: HBASE-9712
>                 URL: https://issues.apache.org/jira/browse/HBASE-9712
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: 
> org.apache.hadoop.hbase.master.TestSplitLogManager-output (1).txt
>
>
> Opening this issue to keep account of failures.  It failed for me locally 
> just now.
> Failed tests:   
> testTaskResigned(org.apache.hadoop.hbase.master.TestSplitLogManager): 
> version1=2, version=2
> {code}
> durruti:hbase stack$ more 
> hbase-server/target/surefire-reports/org.apache.hadoop.hbase.master.TestSplitLogManager.txt
> -------------------------------------------------------------------------------
> Test set: org.apache.hadoop.hbase.master.TestSplitLogManager
> -------------------------------------------------------------------------------
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 86.697 sec 
> <<< FAILURE!
> testTaskResigned(org.apache.hadoop.hbase.master.TestSplitLogManager)  Time 
> elapsed: 0.004 sec  <<< FAILURE!
> java.lang.AssertionError: version1=2, version=2
>         at org.junit.Assert.fail(Assert.java:88)
>         at org.junit.Assert.assertTrue(Assert.java:41)
>         at 
> org.apache.hadoop.hbase.master.TestSplitLogManager.testTaskResigned(TestSplitLogManager.java:387)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>         at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>         at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>         at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>         at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>         at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>         at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>         at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>         at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>         at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>         at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>         at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>         at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>         at org.junit.runners.Suite.runChild(Suite.java:127)
>         at org.junit.runners.Suite.runChild(Suite.java:26)
>         at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>         at java.lang.Thread.run(Thread.java:680)
> {code}
> Let me attach the log



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to