[ https://issues.apache.org/jira/browse/HBASE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583685#comment-15583685 ]
Enis Soztutar commented on HBASE-16824: --------------------------------------- There is a deadlock happening with this it seems: Some threads are like this: {code} "22" #222 daemon prio=5 os_prio=31 tid=0x00007fd2063e1800 nid=0x19403 in Object.wait() [0x000070000f373000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:159) - locked <0x00000006c45563d8> (a org.apache.hadoop.hbase.regionserver.wal.SyncFuture) at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:641) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.publishSyncThenBlockOnCompletion(FSHLog.java:765) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:807) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster$Appender.run(TestLogRollingNoCluster.java:168) {code} Others: {code} "21" #221 daemon prio=5 os_prio=31 tid=0x00007fd205fe7800 nid=0x19203 waiting on condition [0x000070000f270000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000006c442d150> (a java.util.concurrent.locks.ReentrantLock$FairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:664) at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:426) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster$Appender.run(TestLogRollingNoCluster.java:153) {code} Syncers: {code} "sync.4" #198 daemon prio=5 os_prio=31 tid=0x00007fd20a231800 nid=0x16603 waiting on condition [0x000070000dc2e000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000006c4598028> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:609) at java.lang.Thread.run(Thread.java:745) {code} and the RBEH: {code} "Time-limited test.append-pool1-t1" #199 daemon prio=5 os_prio=31 tid=0x00007fd207f54000 nid=0x15c03 in Object.wait() [0x000070000d71f000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:460) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.attainSafePoint(FSHLog.java:1129) - locked <0x00000006c45af270> (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1095) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:946) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Trying to understand how come this happens. Will report back. > Writer.flush() can be called on already closed streams in WAL roll > ------------------------------------------------------------------ > > Key: HBASE-16824 > URL: https://issues.apache.org/jira/browse/HBASE-16824 > Project: HBase > Issue Type: Bug > Reporter: Atri Sharma > Assignee: Enis Soztutar > Attachments: hbase-16824_v1.patch > > > In https://issues.apache.org/jira/browse/HBASE-12074, we hit an error if an > async thread calls flush on a WAL record already closed as the WAL is being > rotated. This JIRA investigates if setting the new WAL record path as the > first operation during WAL rotation will fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)