pushkar-engagio opened a new issue #6163: Pulsar cluster slow to recover from a failed bookkeeper node URL: https://github.com/apache/pulsar/issues/6163 #### Expected behavior I am trying to replace bookkeeper nodes in the cluster. I have a 3 nodes bookkeeper cluster. I added 3 new bookkeeper nodes to the cluster. Shutdown one of the nodes and initiated decommissionbookie process to move ledger to other healthy nodes. There were total 13k ledgers. the process is going but it's been 3 weeks and i still have around 6k ledgers to recover. First 6k ledgers were completed within 2 days but since then it's very slow to recover the ledgers. #### Actual behavior The cluster has been slow to transfer data to other healthy nodes. In the logs we have been seeing following exceptions: 1. ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error reading ledger entry: 2. Error: Bookie handle is not available while reading (All 5 bookies are operational) 3. Error: Too many requests to the same Bookie while reading #### Steps to reproduce Stop bookkeeper on one of the nodes and initiate decommissionbookie to replicate data to rest of the cluster. #### System configuration **Pulsar version**: 2.3 **Operating system**: Amazon linux 2 **Java version**: openjdk version "1.8.0_222"
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
