merlimat opened a new pull request #7110: URL: https://github.com/apache/pulsar/pull/7110
### Motivation PersistentTopic registers a listener for updates to the policies of the topic. PersistentTopic#initializeDispatchRateLimiterIfNeeded runs under a lock on the dispatchRateLimiter object. This method is called from the constructor and also from the a listener callback. If the listener callback triggers during the initial read, then the callbacks will wait on the lock. This occupies one ForkJoinPool.commonPool thread. Multiple triggers on the callbacks will occupy multiple ForkJoinPool.commonPool threads. There needs to be a free thread to complete the initial read, so this read eventually times out. This change fixes this in two ways. Firstly it moves reading from the ZK cache outside of the lock. This has a knock-on effect in other parts of the code, as DispatchRateLimiter and SubscriptionRateLimiter no longer read the Policies for themselves, but require the policies be passed in. Secondly, it makes the zookeeper cache use its own executor rather than ForkJoinPool.commonPool. This can help avoid deadlocks but cannot eliminate them completely if we allow synchronous calls from within callbacks. If we allow this kind of call, we are susceptible to deadlocks, no matter how many threads we add. To allow N threads to call synchronous calls in callbacks we need N+1 threads. Fixing this properly would involve ensuring that sychronous calls do not depend on the async callback executor, which seems like a fairly big change. This problem was the root cause of some flakes on ReplicatorTest. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
