[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432698#comment-13432698
 ] 

surendra singh lilhore commented on BOOKKEEPER-326:
---------------------------------------------------

@Ivan
{quote}
I think it's near impossible to automatically and reliably trigger this 
deadlock, as it's reliant on a delay occurring between bootstrap.connect, and 
adding the FutureListener. I think I could get it to trigger fairly reliably 
with some hacks in the code (no suitable for putting into production), so you 
you submit a patch with the fix, forgetting about the test case, run with the 
hack to ensure it hits, apply your patch, and then run with the hack again to 
verify the deadlock does not occur.
{quote}

The following scenario was tested by applying the patch and "JCarder agent" 
enabled (for deadlock detection).

1.Ledger is created and entries are writter to 3 bookies.
2.one of bookie is killed. (say this bookie is not the first bookie in the 
ensemble)
3.New bookie started.
4.now openLedger() call is made to recover the ledger. as part of this 
readLastConfirmed request will be added each of the bookies with callback 
ReadLastConfirmedOp.readEntryComplete which is synchronized.
5.First callback came from first bookie which is alive in separate thread and 
entered ReadLastConfirmedOp.readEntryComplete() and processing.
6.Another Callback came for the failed bookie from connect() method by holding 
the lock of PerChannelBookieClient instance of failed bookie, and trying to 
invoke the same callback, but BLOCKED. ( Here to invoke the listener in same 
thread, need to wait before future.addListener(..) by putting debug point)
7.As part of first call back, doRecoveryRead() will put one PendingReadOp 
request for asyncread. If this PendingReadOp selects same 
PerChannelBookieClient of failed bookie for read, then it will enter to 
deadlock. (To reproduce, bookieIndex variable can be changed to index of the 
failed bookie in PendingReadOp.sendRead(..))

JCarder Output:
==================
Loaded from database files:
   Nodes: 4112
   Edges: 5946 (excluding 5786 duplicated)

Cycle analysis result:
   Cycles:          0
   Edges in cycles: 0
   Nodes in cycles: 0
   Max cycle depth: 0
   Max graph depth: 2

Ignoring 0 gated cycle(s).
No cycles found!


                
> DeadLock during ledger recovery 
> --------------------------------
>
>                 Key: BOOKKEEPER-326
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-326
>             Project: Bookkeeper
>          Issue Type: Bug
>    Affects Versions: 4.1.0
>            Reporter: Vinay
>            Assignee: Rakesh R
>            Priority: Blocker
>             Fix For: 4.2.0
>
>         Attachments: BK_DeadLock.log, BOOKKEEPER-326.1.patch, 
> BOOKKEEPER-326.2.patch, BOOKKEEPER-326.3.patch, BOOKKEEPER-326.part2.diff, 
> BOOKKEEPER-326.patch
>
>
> Deadlock found during ledger recovery. please find the attached thread dump.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to