[
https://issues.apache.org/jira/browse/HBASE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583685#comment-15583685
]
Enis Soztutar commented on HBASE-16824:
---------------------------------------
There is a deadlock happening with this it seems:
Some threads are like this:
{code}
"22" #222 daemon prio=5 os_prio=31 tid=0x00007fd2063e1800 nid=0x19403 in
Object.wait() [0x000070000f373000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at
org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:159)
- locked <0x00000006c45563d8> (a
org.apache.hadoop.hbase.regionserver.wal.SyncFuture)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:641)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog.publishSyncThenBlockOnCompletion(FSHLog.java:765)
at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:807)
at
org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster$Appender.run(TestLogRollingNoCluster.java:168)
{code}
Others:
{code}
"21" #221 daemon prio=5 os_prio=31 tid=0x00007fd205fe7800 nid=0x19203 waiting
on condition [0x000070000f270000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006c442d150> (a
java.util.concurrent.locks.ReentrantLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at
java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:664)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:426)
at
org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster$Appender.run(TestLogRollingNoCluster.java:153)
{code}
Syncers:
{code}
"sync.4" #198 daemon prio=5 os_prio=31 tid=0x00007fd20a231800 nid=0x16603
waiting on condition [0x000070000dc2e000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006c4598028> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:609)
at java.lang.Thread.run(Thread.java:745)
{code}
and the RBEH:
{code}
"Time-limited test.append-pool1-t1" #199 daemon prio=5 os_prio=31
tid=0x00007fd207f54000 nid=0x15c03 in Object.wait() [0x000070000d71f000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:460)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.attainSafePoint(FSHLog.java:1129)
- locked <0x00000006c45af270> (a java.lang.Object)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1095)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:946)
at
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
Trying to understand how come this happens. Will report back.
> Writer.flush() can be called on already closed streams in WAL roll
> ------------------------------------------------------------------
>
> Key: HBASE-16824
> URL: https://issues.apache.org/jira/browse/HBASE-16824
> Project: HBase
> Issue Type: Bug
> Reporter: Atri Sharma
> Assignee: Enis Soztutar
> Attachments: hbase-16824_v1.patch
>
>
> In https://issues.apache.org/jira/browse/HBASE-12074, we hit an error if an
> async thread calls flush on a WAL record already closed as the WAL is being
> rotated. This JIRA investigates if setting the new WAL record path as the
> first operation during WAL rotation will fix the issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)