massakam opened a new pull request, #3655: URL: https://github.com/apache/bookkeeper/pull/3655
### Motivation Some time ago, I fixed a bug where check of all ledgers periodically run by the auditor got stuck (cf. https://github.com/apache/bookkeeper/pull/3214). That PR was merged so we cherry-picked it and released it, but `checkAllLedgers` still got stuck when the `inFlightReadEntryNumInLedgerChecker` value was very small. The cause is that multiple `BookKeeperClientWorker-OrderedExecutor` threads are blocked waiting for the semaphore to be released. This semaphore is released by the `BookKeeperClientWorker-OrderedExecutor` threads, so they should not run the `LedgerChecker#checkLedger` method. ``` "BookKeeperClientWorker-OrderedExecutor-1-0" #60 prio=5 os_prio=0 tid=0x00007f680401d800 nid=0x75e8 waiting on condition [0x00007f684a6e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000a2121780> (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) at org.apache.bookkeeper.client.LedgerChecker.acquirePermit(LedgerChecker.java:163) at org.apache.bookkeeper.client.LedgerChecker.checkLedger(LedgerChecker.java:431) at org.apache.bookkeeper.replication.Auditor.lambda$null$4(Auditor.java:1216) at org.apache.bookkeeper.replication.Auditor$$Lambda$137/1553334572.openComplete(Unknown Source) at org.apache.bookkeeper.client.LedgerOpenOp.openComplete(LedgerOpenOp.java:232) at org.apache.bookkeeper.client.LedgerOpenOp$2.readLastConfirmedComplete(LedgerOpenOp.java:218) at org.apache.bookkeeper.client.LedgerHandle$14.getLacComplete(LedgerHandle.java:1718) at org.apache.bookkeeper.client.PendingReadLacOp.readLacComplete(PendingReadLacOp.java:154) at org.apache.bookkeeper.proto.PerChannelBookieClient$ReadLacCompletion$1.readLacComplete(PerChannelBookieClient.java:1801) at org.apache.bookkeeper.proto.PerChannelBookieClient$ReadLacCompletion.handleV3Response(PerChannelBookieClient.java:1840) at org.apache.bookkeeper.proto.PerChannelBookieClient$3.safeRun(PerChannelBookieClient.java:1473) at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:750) ``` ### Changes In the `Auditor` class, have the `LedgerChecker#checkLedger` method run in a different thread from the `BookKeeperClientWorker-OrderedExecutor` threads. Master Issue: https://github.com/apache/bookkeeper/issues/3070 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
