[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733266#comment-14733266
 ] 

stack commented on HBASE-14317:
-------------------------------

Looking at recent 1.2 builds before this patch went in, it looks like the tests 
cited above are already problematic:


kalashnikov:hbase.git.commit stack$ python ./dev-support/findHangingTests.py  
https://builds.apache.org/view/H-L/view/HBase/job/HBase-1.2/151/jdk=latest1.7,label=Hadoop/consoleText
Fetching the console output from the URL
Printing hanging tests
Hanging test : 
org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization
Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController2
Hanging test : org.apache.hadoop.hbase.client.TestHCM
Hanging test : org.apache.hadoop.hbase.regionserver.TestHRegion
Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController
Hanging test : 
org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer
Printing Failing tests
Failing test : 
org.apache.hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpoint

or 



kalashnikov:hbase.git.commit stack$ python ./dev-support/findHangingTests.py 
https://builds.apache.org/view/H-L/view/HBase/job/HBase-1.2/150/jdk=latest1.7,label=Hadoop/consoleText
Fetching the console output from the URL
Printing hanging tests
Hanging test : 
org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization
Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController2
Printing Failing tests
Failing test : org.apache.hadoop.hbase.client.TestSnapshotCloneIndependence
Failing test : org.apache.hadoop.hbase.regionserver.TestSplitWalDataLoss
Failing test : org.apache.hadoop.hbase.replication.TestReplicationEndpoint


1.2 builds are failing with a while. Will be back to fix failures.



> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -----------------------------------------------------
>
>                 Key: HBASE-14317
>                 URL: https://issues.apache.org/jira/browse/HBASE-14317
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.1.1
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 2.0.0, 1.2.0, 1.3.0
>
>         Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log, timeouts.branch-1.txt
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to