[ https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717302#comment-14717302 ]
Elliott Clark commented on HBASE-14317: --------------------------------------- Just saw something very like this too. Flushes are waiting on getting a committed seq then failing. The append thread is just stuck. {code} Thread 125 (regionserver/hbase4537.frc3.facebook.com/10.210.81.27:16020.append-pool1-t1): State: TIMED_WAITING Blocked count: 239951 Waited count: 37873297 Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:460) org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.attainSafePoint(FSHLog.java:1786) org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1761) org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1672) com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) {code} > Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL > ----------------------------------------------------- > > Key: HBASE-14317 > URL: https://issues.apache.org/jira/browse/HBASE-14317 > Project: HBase > Issue Type: Bug > Affects Versions: 1.1.1 > Reporter: stack > Priority: Critical > Attachments: [Java] RS stuck on WAL sync to a dead DN - > Pastebin.com.html, raw.php, subset.of.rs.log > > > hbase-1.1.1 and hadoop-2.7.1 > We try to roll logs because can't append (See HDFS-8960) but we get stuck. > See attached thread dump and associated log. What is interesting is that > syncers are waiting to take syncs to run and at same time we want to flush so > we are waiting on a safe point but there seems to be nothing in our ring > buffer; did we go to roll log and not add safe point sync to clear out > ringbuffer? > Needs a bit of study. Try to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)