pushkar-engagio opened a new issue #6163: Pulsar cluster slow to recover from a 
failed bookkeeper node
URL: https://github.com/apache/pulsar/issues/6163
 
 
   #### Expected behavior
   I am trying to replace bookkeeper nodes in the cluster. I have a 3 nodes 
bookkeeper cluster. I added 3 new bookkeeper nodes to the cluster. Shutdown one 
of the nodes and initiated decommissionbookie process to move ledger to other 
healthy nodes. There were total 13k ledgers. the process is going but it's been 
3 weeks and i still have around 6k ledgers to recover.
   First 6k ledgers were completed within 2 days but since then it's very slow 
to recover the ledgers.
   
   #### Actual behavior
   The cluster has been slow to transfer data to other healthy nodes. In the 
logs we have been seeing following exceptions:
   1. ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error 
reading ledger entry:
   2. Error: Bookie handle is not available while reading (All 5 bookies are 
operational)
   3. Error: Too many requests to the same Bookie while reading 
   
   
   #### Steps to reproduce
   Stop bookkeeper on one of the nodes and initiate decommissionbookie to 
replicate data to rest of the cluster.
   
   #### System configuration
   **Pulsar version**: 2.3
   **Operating system**: Amazon linux 2
   **Java version**: openjdk version "1.8.0_222"
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to