lhotari opened a new pull request, #21496: URL: https://github.com/apache/pulsar/pull/21496
### Motivation PulsarService shutdown leaks threads in tests when execution is slow enough that the thread gets interrupted before the second phase of the shutdown is executed. It is hard to reproduce this issue on an ordinary developer machine because execution completes so quickly that the path where the thread has already been interrupted doesn't happen. In CI, there's 2 cores and 4 test processes running concurrently and the problem occurs fairly often. For example, the thread leak detector reported that ServerCnxTest leaked 144 threads [in this build](https://github.com/apache/pulsar/actions/runs/6710630191/job/18242485629#step:16:201): ``` Tests in class ServerCnxTest created thread id 10950 with name 'Thread[broker-topic-workers-OrderedExecutor-0-0,5,main]' Tests in class ServerCnxTest created thread id 10951 with name 'Thread[broker-topic-workers-OrderedExecutor-1-0,5,main]' Tests in class ServerCnxTest created thread id 10952 with name 'Thread[pulsar-stats-updater-OrderedScheduler-0-0,5,main]' Tests in class ServerCnxTest created thread id 10953 with name 'Thread[pulsar-inactivity-monitor-OrderedScheduler-0-0,5,main]' Tests in class ServerCnxTest created thread id 10954 with name 'Thread[pulsar-msg-expiry-monitor-OrderedScheduler-0-0,5,main]' Tests in class ServerCnxTest created thread id 10955 with name 'Thread[pulsar-compaction-monitor-OrderedScheduler-0-0,5,main]' Tests in class ServerCnxTest created thread id 10956 with name 'Thread[pulsar-consumed-ledgers-monitor-OrderedScheduler-0-0,5,main]' Tests in class ServerCnxTest created thread id 10957 with name 'Thread[pulsar-backlog-quota-checker-OrderedScheduler-0-0,5,main]' Tests in class ServerCnxTest created thread id 11122 with name 'Thread[broker-topic-workers-OrderedExecutor-0-0,5,main]' Tests in class ServerCnxTest created thread id 11123 with name 'Thread[broker-topic-workers-OrderedExecutor-1-0,5,main]' Tests in class ServerCnxTest created thread id 11124 with name 'Thread[pulsar-stats-updater-OrderedScheduler-0-0,5,main]' Tests in class ServerCnxTest created thread id 11125 with name 'Thread[pulsar-inactivity-monitor-OrderedScheduler-0-0,5,main]' ... ... ... Warning: Summary: Tests in class org.apache.pulsar.broker.service.ServerCnxTest created 144 new threads. There are now 154 threads in total. ``` ### Modifications - Adding the future cancellation timeout logic doesn't make sense when `brokerShutdownTimeoutMs=0`. Skip adding the timeout logic and refactor the method. - Running the second phase on the completion thread blocks a Netty thread. It's better to run the 2nd phase of the shutdown in a new separate thread. ### Documentation <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. --> - [ ] `doc` <!-- Your PR contains doc changes. --> - [ ] `doc-required` <!-- Your PR changes impact docs and you will update later --> - [x] `doc-not-needed` <!-- Your PR changes do not impact docs --> - [ ] `doc-complete` <!-- Docs have been already added --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
