[ 
https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568694#comment-13568694
 ] 

Anoop Sam John commented on HBASE-7728:
---------------------------------------

[~stack] Checked with 2 threads and giving break points,  one is doing the 
append and other doing the roll. Able to reproduce the deadlock and patch 
solves that. The patch make sure the locking order is maintained same as that 
with roller.

I think in the issue what happened is the 1st time call to hlogFlush() thrown 
an IOE but there were no roll happened and new writer created. The roller  
thread is still trying for cleanupCurrentWriter. So IOE because of some other 
reason happened. I dont think there was a parallel explicit close() call also 
happened. In close() we can see the it joins the log syncer thread and then 
only doing the writer close.

                
> deadlock occurs between hlog roller and hlog syncer
> ---------------------------------------------------
>
>                 Key: HBASE-7728
>                 URL: https://issues.apache.org/jira/browse/HBASE-7728
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 0.94.2
>         Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux
>            Reporter: Wang Qiang
>            Assignee: Ted Yu
>            Priority: Blocker
>             Fix For: 0.96.0, 0.94.5
>
>         Attachments: 7728-0.94.txt, 7728-suggest-0.96.txt, 7728-suggest.txt, 
> 7728-v1.txt, 7728-v2.txt, 7728-v3.txt, 7728-v4.txt
>
>
> the hlog roller thread and hlog syncer thread may occur dead lock with the 
> 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread 
> blocked on hlog append. the jstack info is as follow :
> "regionserver60020.logRoller":
>         at 
> org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305)
>         - waiting to lock <0x000000067bf88d58> (a java.lang.Object)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657)
>         - locked <0x000000067d54ace0> (a java.lang.Object)
>         at 
> org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
>         at java.lang.Thread.run(Thread.java:662)
> "regionserver60020.logSyncer":
>         at 
> org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314)
>         - waiting to lock <0x000000067d54ace0> (a java.lang.Object)
>         - locked <0x000000067bf88d58> (a java.lang.Object)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235)
>         at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to