[
https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569069#comment-13569069
]
Lars Hofhansl commented on HBASE-7728:
--------------------------------------
That is why we should not change that. What I am saying is that if you access
the writer inside the updatalock in syncer() it will not be null.
If we set it to null when closing, we might expose us to other NPEs in other
parts of the code. As seen from this issue, this code is finicky.
That main point I am making is that the deadlock (this issues) is fixed by your
reordering of the locks (nice change, btw), and that the null stuff in an
extraneous change.
We can add the null set/check to 0.96, for 0.94 I would prefer the minimum
change possible (unless you really feel strongly about this).
> deadlock occurs between hlog roller and hlog syncer
> ---------------------------------------------------
>
> Key: HBASE-7728
> URL: https://issues.apache.org/jira/browse/HBASE-7728
> Project: HBase
> Issue Type: Bug
> Components: wal
> Affects Versions: 0.94.2
> Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux
> Reporter: Wang Qiang
> Assignee: Ted Yu
> Priority: Blocker
> Fix For: 0.96.0, 0.94.5
>
> Attachments: 7728-0.94.txt, 7728-0.94-v2.txt, 7728-suggest-0.96.txt,
> 7728-suggest.txt, 7728-v1.txt, 7728-v2.txt, 7728-v3.txt, 7728-v4.txt
>
>
> the hlog roller thread and hlog syncer thread may occur dead lock with the
> 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread
> blocked on hlog append. the jstack info is as follow :
> "regionserver60020.logRoller":
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305)
> - waiting to lock <0x000000067bf88d58> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
> at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657)
> - locked <0x000000067d54ace0> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
> at java.lang.Thread.run(Thread.java:662)
> "regionserver60020.logSyncer":
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314)
> - waiting to lock <0x000000067d54ace0> (a java.lang.Object)
> - locked <0x000000067bf88d58> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
> at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235)
> at java.lang.Thread.run(Thread.java:662)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira