sijie opened a new pull request #7506:
URL: https://github.com/apache/pulsar/pull/7506


   *Motivation*
   
   Currently broker has a timeout mechanism on loading topics. However the 
underlying managed ledger library
   doesn't provide a timeout mechanism. This will get into a situation that: A 
TopicLoad operation times out
   after 30 seconds. But the CompletableFuture of opening a managed ledger is 
still kept in the cache of managed ledger
   factory. The completable future will never returns. So any sub-sequent topic 
lookups will fail because any
   attempts to load a topic will never attempt to re-open a managed ledger.
   
   *Modification*
   
   Introduce a timeout mechanism in managed ledger factory. If a managed ledger 
is not open within a given timeout
   period, the CompletableFuture will be removed. This allows any sub-sequent 
attempts to load topics can try to
   open the managed ledger again.
   
   *Tests*
   
   This problem can be constantly reproduced in a chaos test in kubernetes by 
killing k8s worker nodes. It can cause
   producer stuck forever until the owner broker pod is restarted. The change 
has been verified in a chaos testing environment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to