[ 
https://issues.apache.org/jira/browse/HBASE-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797171#comment-13797171
 ] 

stack commented on HBASE-9024:
------------------------------

This test is still problematic.  It is the most popular fail these times.  
https://builds.apache.org/view/H-L/view/HBase/job/hbase-0.96-hadoop2/89/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRolling/testLogRollOnDatanodeDeath/

This is a fail on different test than is up in this summary.  HBASE-8349 did 
some fixup of above failure.  The above URL points at it happening again.  Let 
me add some debug.

> TestLogRolling.testLogRollOnPipelineRestart fails/goes zombie
> -------------------------------------------------------------
>
>                 Key: HBASE-9024
>                 URL: https://issues.apache.org/jira/browse/HBASE-9024
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>            Reporter: stack
>
> TestLogRolling.testLogRollOnPipelineRestart failed on hadoop1 here: 
> https://builds.apache.org/job/hbase-0.95/352/consoleText It went zombie.
> In the double thread dump on the end:
> {code}
> "pool-1-thread-1" prio=10 tid=0x73f9dc00 nid=0x3a34 in Object.wait() 
> [0x7517d000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       - waiting on <0xcf624ad0> (a java.util.concurrent.atomic.AtomicLong)
>       at 
> org.apache.hadoop.hbase.client.AsyncProcess.waitForNextTaskDone(AsyncProcess.java:634)
>       - locked <0xcf624ad0> (a java.util.concurrent.atomic.AtomicLong)
>       at 
> org.apache.hadoop.hbase.client.AsyncProcess.waitForMaximumCurrentTasks(AsyncProcess.java:659)
>       at 
> org.apache.hadoop.hbase.client.AsyncProcess.waitUntilDone(AsyncProcess.java:670)
>       at 
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:813)
>       at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1170)
>       at org.apache.hadoop.hbase.client.HTable.put(HTable.java:753)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.doPut(TestLogRolling.java:640)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.writeData(TestLogRolling.java:248)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnPipelineRestart(TestLogRolling.java:515)
> {code}
> ... we are stuck here.
> The math looks like it could go wonky. But looking in the output for the 
> test, it seems that when this test ran we got this:
> {code}
> 2013-07-23 01:23:29,560 INFO [pool-1-thread-1] 
> hbase.HBaseTestingUtility(922): Minicluster is down
> 2013-07-23 01:23:29,574 INFO [pool-1-thread-1] hbase.ResourceChecker(171): 
> after: regionserver.wal.TestLogRolling#testLogRollOnPipelineRestart Thread=39 
> (was 31) - Thread LEAK? -, OpenFileDescriptor=312 (was 272) - 
> OpenFileDescriptor LEAK? -, MaxFileDescriptor=40000 (was 40000), 
> SystemLoadAverage=351 (was 368), ProcessCount=144 (was 142) - ProcessCount 
> LEAK? -, AvailableMemoryMB=906 (was 1995), ConnectionCount=0 (was 0)
> {code}
> This test has a history of failures.  See HBASE-5995 where it was fixed and 
> reenabled once.  Thought was that it was a hadoop2 issue but this cited 
> failure is on hadoop1.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to