horizonzy opened a new pull request, #21181:
URL: https://github.com/apache/pulsar/pull/21181

   ### Motivation
   After the AutoRecovery cluster is started, it will elect an Auditor Leader 
to perform its work, while the remaining nodes act as Followers. When the 
Leader node goes down, the Followers will initiate an election process to 
select a new Auditor Leader to carry out the tasks. This mechanism ensures high 
availability of the Auditor component.
   
   In the implementation of PulsarLedgerAuditorManager, the Auditor Follower 
will continuously loop in a while(true) loop until it becomes the Leader, at 
which point it will exit the loop. The change in status is notified through 
LeaderElection.
   
   There is a mechanism in place: when a SessionLost event occurs, AutoRecovery 
initiates a shutdown, which closes the Auditor.
   
   However, in the implementation of LeaderElection, when a SessionLost event 
happens, it becomes unable to continue pushing status change notifications to 
the Auditor Follower. As a result, the Auditor Follower cannot receive the 
status change notification, causing the thread to remain stuck and preventing 
the shutdown of the auditor.
   
   The stack info:
   ```
   "AuditorElector-127.0.0.1:58641" #257 prio=5 os_prio=31 cpu=0.22ms 
elapsed=69.67s tid=0x000000013bc23800 nid=0x2a20f in Object.wait()  
[0x00000004e3302000]
      java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait([email protected]/Native Method)
        - waiting on <0x0000200006026150> (a 
org.apache.pulsar.metadata.bookkeeper.PulsarLedgerAuditorManager)
        at java.lang.Object.wait([email protected]/Object.java:338)
        at 
org.apache.pulsar.metadata.bookkeeper.PulsarLedgerAuditorManager.tryToBecomeAuditor(PulsarLedgerAuditorManager.java:77)
        - locked <0x0000200006026150> (a 
org.apache.pulsar.metadata.bookkeeper.PulsarLedgerAuditorManager)
        at 
org.apache.bookkeeper.replication.AuditorElector$3.run(AuditorElector.java:185)
        at 
java.util.concurrent.Executors$RunnableAdapter.call([email protected]/Executors.java:539)
        at 
java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:264)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
        at java.lang.Thread.run([email protected]/Thread.java:833)
   ```
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc` <!-- Your PR contains doc changes. -->
   - [ ] `doc-required` <!-- Your PR changes impact docs and you will update 
later -->
   - [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->
   
   ### Matching PR in forked repository
   
   PR in forked repository: <!-- ENTER URL HERE -->
   
   <!--
   After opening this PR, the build in apache/pulsar will fail and instructions 
will
   be provided for opening a PR in the PR author's forked repository.
   
   apache/pulsar pull requests should be first tested in your own fork since 
the 
   apache/pulsar CI based on GitHub Actions has constrained resources and quota.
   GitHub Actions provides separate quota for pull requests that are executed 
in 
   a forked repository.
   
   The tests will be run in the forked repository until all PR review comments 
have
   been handled, the tests pass and the PR is approved by a reviewer.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to