[
https://issues.apache.org/jira/browse/BOOKKEEPER-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406824#comment-13406824
]
Sijie Guo commented on BOOKKEEPER-326:
--------------------------------------
ah, this issue is quite similar as BOOKKEEPER-215 and BOOKKEEPER-239. The
common cause of these jiras are we call bookkeeper operations
(addEntry/readEntry) in previous operations' callbacks.
a possible fix (as Uma mentioned) is to submit bookkeeper operations in another
OrderedSafeExecutor (callbackWorker) not call them directly in previous
operations' callback, which is in the early patch for BOOKKEEPER-215. I think
this fix could resolve deadlock issue encountered in ledger recovery both
BOOKKEEPER-239 and BOOKKEEPER-326.
{code}
import org.apache.bookkeeper.client.DigestManager.RecoveryData;
import org.apache.bookkeeper.proto.BookieProtocol;
import org.apache.bookkeeper.proto.BookkeeperInternalCallbacks.GenericCallback;
+import org.apache.bookkeeper.util.SafeRunnable;
import org.apache.zookeeper.KeeperException;
import org.slf4j.Logger;
@@ -89,16 +90,22 @@ class LedgerRecoveryOp implements ReadCallback, AddCallback
{
*/
private void doRecoveryRead() {
lh.lastAddConfirmed++;
- lh.asyncReadEntries(lh.lastAddConfirmed, lh.lastAddConfirmed, this,
null);
+ LOG.debug("Submit recovery read entry {} for ledger {}",
lh.lastAddConfirmed, lh.getId());
+ lh.bk.callbackWorker.submitOrdered(lh.getId(), new SafeRunnable() {
+ @Override
+ public void safeRun() {
+ lh.asyncReadEntries(lh.lastAddConfirmed, lh.lastAddConfirmed,
LedgerRecoveryOp.this, null);
+ }
+ });
}
@Override
- public void readComplete(int rc, LedgerHandle lh, Enumeration<LedgerEntry>
seq, Object ctx) {
+ public void readComplete(int rc, final LedgerHandle lh,
Enumeration<LedgerEntry> seq, Object ctx) {
// get back to prev value
lh.lastAddConfirmed--;
if (rc == BKException.Code.OK) {
LedgerEntry entry = seq.nextElement();
- byte[] data = entry.getEntry();
+ final byte[] data = entry.getEntry();
/*
* We will add this entry again to make sure it is written to
enough
@@ -106,7 +113,12 @@ class LedgerRecoveryOp implements ReadCallback,
AddCallback {
* be added again when processing the call to add it.
*/
lh.length = entry.getLength() - (long) data.length;
- lh.asyncRecoveryAddEntry(data, 0, data.length, this, null);
+ lh.bk.callbackWorker.submitOrdered(lh.getId(), new SafeRunnable() {
+ @Override
+ public void safeRun() {
+ lh.asyncRecoveryAddEntry(data, 0, data.length,
LedgerRecoveryOp.this, null);
+ }
+ });
return;
}
{code}
> DeadLock during ledger recovery
> --------------------------------
>
> Key: BOOKKEEPER-326
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-326
> Project: Bookkeeper
> Issue Type: Bug
> Affects Versions: 4.1.0
> Reporter: Vinay
> Attachments: BK_DeadLock.log
>
>
> Deadlock found during ledger recovery. please find the attached thread dump.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira