[
https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567496#comment-13567496
]
Anoop Sam John commented on HBASE-7728:
---------------------------------------
LogRoller thread trying to do a rolling over current log file. It captured the
updateLock already.
{code}
HLog#rollWriter(boolean force)
synchronized (updateLock) {
// Clean up current writer.
Path oldFile = cleanupCurrentWriter(currentFilenum);
this.writer = nextWriter;
....
}
{code}
As part of the clean up current writer, this thread try to sync the pending
writes
{code}
HLog#cleanupCurrentWriter(){
....
sync();
}
this.writer.close();
}
{code}
At the same time logSyncer thread was doing a defered log sync operation
{code}
HLog#syncer(long txid){
...
synchronized (flushLock) {
....
try {
logSyncerThread.hlogFlush(tempWriter, pending);
} catch(IOException io) {
synchronized (this.updateLock) {
// HBASE-4387, HBASE-5623, retry with updateLock held
tempWriter = this.writer;
logSyncerThread.hlogFlush(tempWriter, pending);
}
}
}
{code}
This thread trying to grab the updateLock and holding the flushLock. Same time
the roller thread coming and as part of clean up sync it tries to grab
flushLock.
IOException might have happened in the logSyncer
thread(logSyncerThread.hlogFlush). At this time our assumption is a log
rollover already happened. That is why we try to write again with updateLock
held and getting the writer again. [The writer on which the IOE happened should
have closed.]
In roller thread the writer close happens after the cleanup operation.
So I guess logSyncerThread.hlogFlush thrown IOE not because of a log roll.
With out assuming the log roll in catch block we can check for tempWriter ==
this.writer; ??
I am not an expert in this area. As per a quick code study adding my
observation. If wrong pls correct me. Any logs with you when this happened?
> deadlock occurs between hlog roller and hlog syncer
> ---------------------------------------------------
>
> Key: HBASE-7728
> URL: https://issues.apache.org/jira/browse/HBASE-7728
> Project: HBase
> Issue Type: Bug
> Components: wal
> Affects Versions: 0.94.2
> Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux
> Reporter: Wang Qiang
> Priority: Blocker
>
> the hlog roller thread and hlog syncer thread may occur dead lock with the
> 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread
> blocked on hlog append. the jstack info is as follow :
> "regionserver60020.logRoller":
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305)
> - waiting to lock <0x000000067bf88d58> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
> at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657)
> - locked <0x000000067d54ace0> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
> at java.lang.Thread.run(Thread.java:662)
> "regionserver60020.logSyncer":
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314)
> - waiting to lock <0x000000067d54ace0> (a java.lang.Object)
> - locked <0x000000067bf88d58> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
> at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235)
> at java.lang.Thread.run(Thread.java:662)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira