[
https://issues.apache.org/jira/browse/BOOKKEEPER-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432698#comment-13432698
]
surendra singh lilhore commented on BOOKKEEPER-326:
---------------------------------------------------
@Ivan
{quote}
I think it's near impossible to automatically and reliably trigger this
deadlock, as it's reliant on a delay occurring between bootstrap.connect, and
adding the FutureListener. I think I could get it to trigger fairly reliably
with some hacks in the code (no suitable for putting into production), so you
you submit a patch with the fix, forgetting about the test case, run with the
hack to ensure it hits, apply your patch, and then run with the hack again to
verify the deadlock does not occur.
{quote}
The following scenario was tested by applying the patch and "JCarder agent"
enabled (for deadlock detection).
1.Ledger is created and entries are writter to 3 bookies.
2.one of bookie is killed. (say this bookie is not the first bookie in the
ensemble)
3.New bookie started.
4.now openLedger() call is made to recover the ledger. as part of this
readLastConfirmed request will be added each of the bookies with callback
ReadLastConfirmedOp.readEntryComplete which is synchronized.
5.First callback came from first bookie which is alive in separate thread and
entered ReadLastConfirmedOp.readEntryComplete() and processing.
6.Another Callback came for the failed bookie from connect() method by holding
the lock of PerChannelBookieClient instance of failed bookie, and trying to
invoke the same callback, but BLOCKED. ( Here to invoke the listener in same
thread, need to wait before future.addListener(..) by putting debug point)
7.As part of first call back, doRecoveryRead() will put one PendingReadOp
request for asyncread. If this PendingReadOp selects same
PerChannelBookieClient of failed bookie for read, then it will enter to
deadlock. (To reproduce, bookieIndex variable can be changed to index of the
failed bookie in PendingReadOp.sendRead(..))
JCarder Output:
==================
Loaded from database files:
Nodes: 4112
Edges: 5946 (excluding 5786 duplicated)
Cycle analysis result:
Cycles: 0
Edges in cycles: 0
Nodes in cycles: 0
Max cycle depth: 0
Max graph depth: 2
Ignoring 0 gated cycle(s).
No cycles found!
> DeadLock during ledger recovery
> --------------------------------
>
> Key: BOOKKEEPER-326
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-326
> Project: Bookkeeper
> Issue Type: Bug
> Affects Versions: 4.1.0
> Reporter: Vinay
> Assignee: Rakesh R
> Priority: Blocker
> Fix For: 4.2.0
>
> Attachments: BK_DeadLock.log, BOOKKEEPER-326.1.patch,
> BOOKKEEPER-326.2.patch, BOOKKEEPER-326.3.patch, BOOKKEEPER-326.part2.diff,
> BOOKKEEPER-326.patch
>
>
> Deadlock found during ledger recovery. please find the attached thread dump.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira