[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406824#comment-13406824
 ] 

Sijie Guo commented on BOOKKEEPER-326:
--------------------------------------

ah, this issue is quite similar as BOOKKEEPER-215 and BOOKKEEPER-239. The 
common cause of these jiras are we call bookkeeper operations 
(addEntry/readEntry) in previous operations' callbacks. 

a possible fix (as Uma mentioned) is to submit bookkeeper operations in another 
OrderedSafeExecutor (callbackWorker) not call them directly in previous 
operations' callback, which is in the early patch for BOOKKEEPER-215. I think 
this fix could resolve deadlock issue encountered in ledger recovery both 
BOOKKEEPER-239 and BOOKKEEPER-326.

{code}
 import org.apache.bookkeeper.client.DigestManager.RecoveryData;
 import org.apache.bookkeeper.proto.BookieProtocol;
 import org.apache.bookkeeper.proto.BookkeeperInternalCallbacks.GenericCallback;
+import org.apache.bookkeeper.util.SafeRunnable;
 
 import org.apache.zookeeper.KeeperException;
 import org.slf4j.Logger;
@@ -89,16 +90,22 @@ class LedgerRecoveryOp implements ReadCallback, AddCallback 
{
      */
     private void doRecoveryRead() {
         lh.lastAddConfirmed++;
-        lh.asyncReadEntries(lh.lastAddConfirmed, lh.lastAddConfirmed, this, 
null);
+        LOG.debug("Submit recovery read entry {} for ledger {}", 
lh.lastAddConfirmed, lh.getId());
+        lh.bk.callbackWorker.submitOrdered(lh.getId(), new SafeRunnable() {
+            @Override
+            public void safeRun() {
+                lh.asyncReadEntries(lh.lastAddConfirmed, lh.lastAddConfirmed, 
LedgerRecoveryOp.this, null);
+            }
+        });
     }
 
     @Override
-    public void readComplete(int rc, LedgerHandle lh, Enumeration<LedgerEntry> 
seq, Object ctx) {
+    public void readComplete(int rc, final LedgerHandle lh, 
Enumeration<LedgerEntry> seq, Object ctx) {
         // get back to prev value
         lh.lastAddConfirmed--;
         if (rc == BKException.Code.OK) {
             LedgerEntry entry = seq.nextElement();
-            byte[] data = entry.getEntry();
+            final byte[] data = entry.getEntry();
 
             /*
              * We will add this entry again to make sure it is written to 
enough
@@ -106,7 +113,12 @@ class LedgerRecoveryOp implements ReadCallback, 
AddCallback {
              * be added again when processing the call to add it.
              */
             lh.length = entry.getLength() - (long) data.length;
-            lh.asyncRecoveryAddEntry(data, 0, data.length, this, null);
+            lh.bk.callbackWorker.submitOrdered(lh.getId(), new SafeRunnable() {
+                @Override
+                public void safeRun() {
+                    lh.asyncRecoveryAddEntry(data, 0, data.length, 
LedgerRecoveryOp.this, null);
+                }
+            });
 
             return;
         }
{code}

                
> DeadLock during ledger recovery 
> --------------------------------
>
>                 Key: BOOKKEEPER-326
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-326
>             Project: Bookkeeper
>          Issue Type: Bug
>    Affects Versions: 4.1.0
>            Reporter: Vinay
>         Attachments: BK_DeadLock.log
>
>
> Deadlock found during ledger recovery. please find the attached thread dump.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to