### Motivation

Frequently, we are having issue where many topics across the all brokers get 
stuck on creating-ledger state and create-ledger-callback never gets completed. 
Right now, it mainly happens when 
1.  Anytime ZK leader restarts
2. ZK quorum restarts due to [32 bit 
rollover](https://jira.apache.org/jira/browse/ZOOKEEPER-1278)
In such cases, zk-client doesn't complete the callback and broker keeps waiting 
with creating-ledger state until we unload the topic. It happens frequently to 
us and it's hard to catch such stuck topics and it requires roll-over broker 
restart.
So, Broker should have a way to timeout LedgerOp (create/delete) and complete 
callback with timeout-exception.

### Modifications

Add Ledger-Op timeout configuration, and managed-ledger fails callback if 
ledger-creation doesn't complete in that time duration.

### Result

Broker can recover stuck managed-ledger which are waiting on ledger creation 
callback.


[ Full content available at: 
https://github.com/apache/incubator-pulsar/pull/2535 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to