hrsakai opened a new issue #1578: Auditor run Periodic check only once
URL: https://github.com/apache/bookkeeper/issues/1578
 
 
   **BUG REPORT**
   
   **What did you do?**
   
   In our cluster, Auditor run periodic check only once. If interval expires 
after first periodic check, auditor will not run periodic check.
   If we want to run periodic check again, we have to restart auditor bookie.
   
   Auditor's thread dump
   It seems that `AuditorBookie` thread stop by `CountDownLatch` with some 
reason.
   https://gist.github.com/hrsakai/d65e8e2cd511173232b1010a9bbdf126
   ```
   "AuditorBookie-XXXXX:3181" #40 daemon prio=5 os_prio=0 
tid=0x00007f049c117830 nid=0x5da4 waiting on condition [0x00007f0477dfc000]
      java.lang.Thread.State: WAITING (parking)
           at sun.misc.Unsafe.park(Native Method)
           - parking to wait for  <0x00000000e04e54f8> (a 
java.util.concurrent.CountDownLatch$Sync)
           at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
           at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
           at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
           at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
           at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
           at 
org.apache.bookkeeper.replication.Auditor.checkAllLedgers(Auditor.java:696)
           at org.apache.bookkeeper.replication.Auditor$5.run(Auditor.java:359)
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
           at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
           at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   ```
   
   I saw many timed-out logs in Auditor's log file.
   ```
   23:14:56.769 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO  
o.a.b.proto.PerChannelBookieClient   - Timed-out 2 operations to channel [id: 
0xa17320e1, L:/AAAA:38234 - R:XXXX.co.jp/BBBB:3181] for BBBB:3181
   23:14:56.921 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO  
o.a.b.proto.PerChannelBookieClient   - Timed-out 48 operations to channel [id: 
0x359c20bd, L:/AAAA:38222 - R:BBBB/BBBB:3181] for BBBB:3181
   23:15:37.768 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO  
o.a.b.proto.PerChannelBookieClient   - Timed-out 2 operations to channel [id: 
0xa17320e1, L:/AAAA:38234 - R:XXXX.co.jp/BBBB:3181] for BBBB:3181
   23:15:37.921 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO  
o.a.b.proto.PerChannelBookieClient   - Timed-out 91 operations to channel [id: 
0x359c20bd, L:/AAAA:38222 - R:BBBB/BBBB:3181] for BBBB:3181
   ・
   ・
   ```
   
   
   **What did you expect to see?**
   
   Auditor run periodic check after every interval expires.
   
   **What did you see instead?**
   
   Auditor run periodic check only once.
   
   **System configuration**
   BookKeeper version : 4.7.0
   Number of Bookies: 5
   Ensemble size: 2
   Write quorum size: 2
   Ack quorum size: 2
   Priodic check interval: 1day
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to