horizonzy commented on code in PR #21797:
URL: https://github.com/apache/pulsar/pull/21797#discussion_r1436278822


##########
managed-ledger/src/test/java/org/apache/bookkeeper/test/BookKeeperClusterTestCase.java:
##########
@@ -288,6 +288,7 @@ protected void stopBKCluster() throws Exception {
 
         for (ServerTester t : servers) {
             t.shutdown();
+            t.stopAutoRecovery();

Review Comment:
   Ok, I found the problem. In the test, it may kill the follower auditor, the 
follower auditor can't be aware of the close behavior, because the follower 
auditor is blocked due to waiting leader shutdown.
   
   ### The reproduce code:
   ```
       public TestAutoRecoveryAlongWithBookieServers() throws Exception {
           super(2);
           setAutoRecoveryEnabled(true);
           
Class.forName("org.apache.pulsar.metadata.bookkeeper.PulsarMetadataClientDriver");
           
Class.forName("org.apache.pulsar.metadata.bookkeeper.PulsarMetadataBookieDriver");
       }
   
       @Test
       public void testAutoRecoveryAlongWithBookieServers() throws Exception {
           BookieId firstBookie = getBookie(0);
           BookieId secondBookie = getBookie(1);
           Auditor auditor = getAuditor(10000, TimeUnit.MILLISECONDS);
           Field bookieIdentifier = 
auditor.getClass().getDeclaredField("bookieIdentifier");
           bookieIdentifier.setAccessible(true);
           String auditorId = (String) bookieIdentifier.get(auditor);
           System.out.println("The first kill");
           if (firstBookie.toString().equals(auditorId)) {
               killBookie(secondBookie);
           } else {
               killBookie(firstBookie);
           }
           Thread.sleep(10000);
           System.out.println("The second kill");
           if (firstBookie.toString().equals(auditorId)) {
               killBookie(firstBookie);
           } else {
               killBookie(secondBookie);
           }
           System.in.read();
       }
   
   ```
   
   Then using `jstack` to get the stack, you will see the follower auditor 
can't shut down.
   
   
   ```
   "AuditorElector-127.0.0.1:54580" #220 prio=5 os_prio=31 cpu=0.15ms 
elapsed=458.13s tid=0x000000015e200800 nid=0x18a03 in Object.wait()  
[0x00000004f144e000]
      java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait([email protected]/Native Method)
        - waiting on <0x000020000888b930> (a 
org.apache.pulsar.metadata.bookkeeper.PulsarLedgerAuditorManager)
        at java.lang.Object.wait([email protected]/Object.java:338)
        at 
org.apache.pulsar.metadata.bookkeeper.PulsarLedgerAuditorManager.tryToBecomeAuditor(PulsarLedgerAuditorManager.java:90)
        - locked <0x000020000888b930> (a 
org.apache.pulsar.metadata.bookkeeper.PulsarLedgerAuditorManager)
        at 
org.apache.bookkeeper.replication.AuditorElector$3.run(AuditorElector.java:185)
        at 
java.util.concurrent.Executors$RunnableAdapter.call([email protected]/Executors.java:539)
        at 
java.util.concurrent.FutureTask.run$$$capture([email protected]/FutureTask.java:264)
        at 
java.util.concurrent.FutureTask.run([email protected]/FutureTask.java)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
        at java.lang.Thread.run([email protected]/Thread.java:833)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to