[ 
https://issues.apache.org/jira/browse/HBASE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583685#comment-15583685
 ] 

Enis Soztutar commented on HBASE-16824:
---------------------------------------

There is a deadlock happening with this it seems: 
Some threads are like this: 
{code}
"22" #222 daemon prio=5 os_prio=31 tid=0x00007fd2063e1800 nid=0x19403 in 
Object.wait() [0x000070000f373000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at 
org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:159)
        - locked <0x00000006c45563d8> (a 
org.apache.hadoop.hbase.regionserver.wal.SyncFuture)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:641)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.publishSyncThenBlockOnCompletion(FSHLog.java:765)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:807)
        at 
org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster$Appender.run(TestLogRollingNoCluster.java:168)
{code}

Others: 
{code}
"21" #221 daemon prio=5 os_prio=31 tid=0x00007fd205fe7800 nid=0x19203 waiting 
on condition [0x000070000f270000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006c442d150> (a 
java.util.concurrent.locks.ReentrantLock$FairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at 
java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:664)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:426)
        at 
org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster$Appender.run(TestLogRollingNoCluster.java:153)
{code}

Syncers:
{code}
"sync.4" #198 daemon prio=5 os_prio=31 tid=0x00007fd20a231800 nid=0x16603 
waiting on condition [0x000070000dc2e000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006c4598028> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:609)
        at java.lang.Thread.run(Thread.java:745)
{code}

and the RBEH: 
{code}
"Time-limited test.append-pool1-t1" #199 daemon prio=5 os_prio=31 
tid=0x00007fd207f54000 nid=0x15c03 in Object.wait() [0x000070000d71f000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:460)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.attainSafePoint(FSHLog.java:1129)
        - locked <0x00000006c45af270> (a java.lang.Object)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1095)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:946)
        at 
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}

Trying to understand how come this happens. Will report back. 

> Writer.flush() can be called on already closed streams in WAL roll
> ------------------------------------------------------------------
>
>                 Key: HBASE-16824
>                 URL: https://issues.apache.org/jira/browse/HBASE-16824
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Atri Sharma
>            Assignee: Enis Soztutar
>         Attachments: hbase-16824_v1.patch
>
>
> In https://issues.apache.org/jira/browse/HBASE-12074, we hit an error if an 
> async thread calls flush on a WAL record already closed as the WAL is being 
> rotated. This JIRA investigates if setting the new WAL record path as the 
> first operation during WAL rotation will fix the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to