[
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715328#comment-14715328
]
stack commented on HBASE-14317:
-------------------------------
Is the concurrent shutting of regions which are waiting on safe point:
{code}
"RS_CLOSE_REGION-r12s16:9104-1" #33639 prio=5 os_prio=0 tid=0x00007fbf546fe000
nid=0x563 in Object.wait() [0x00007fbf38107000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at
org.apache.hadoop.hbase.regionserver.HRegion.waitForFlushesAndCompactions(HRegion.java:1512)
- locked <0x000000056baa4888> (a
org.apache.hadoop.hbase.regionserver.HRegion$WriteState)
at
org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1371)
- locked <0x000000056baa4888> (a
org.apache.hadoop.hbase.regionserver.HRegion$WriteState)
at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1336)
- locked <0x000000056baaf928> (a java.lang.Object)
at
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
... and then the FATAL roll of logs happening at same time the issue? Dig.
> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -----------------------------------------------------
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.1.1
> Reporter: stack
> Attachments: [Java] RS stuck on WAL sync to a dead DN -
> Pastebin.com.html, raw.php, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck.
> See attached thread dump and associated log. What is interesting is that
> syncers are waiting to take syncs to run and at same time we want to flush so
> we are waiting on a safe point but there seems to be nothing in our ring
> buffer; did we go to roll log and not add safe point sync to clear out
> ringbuffer?
> Needs a bit of study. Try to reproduce.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)