TakaHiro0208 opened a new issue, #3316:
URL: https://github.com/apache/bookkeeper/issues/3316

   **BUG REPORT**
   
   ***Describe the bug***
   
       Our Production pulsar cluster is multiple nodes with  E-Qw-Qa(3-3-2), 
enabling auto-recovery by "./bin/bookkeeper shell autorecovery -enable", 
bookkeeper version is 4.14.1 . Now one bookie server is down, and cluster do 
autoRecovery. However, there is a ledger can not read by the other 2 ensemble, 
the error is both : Ledger 1294 not found (It seems the ledger has been deleted)
   
   ```
   [BookieReadThreadPool-OrderedExecutor-0-0] ERROR 
org.apache.bookkeeper.proto.ReadLacProcessorV3 - No ledger found while 
performing readLac from ledger: 1294
   org.apache.bookkeeper.bookie.Bookie$NoLedgerException: Ledger 1294 not found
           at 
org.apache.bookkeeper.bookie.LedgerDescriptor.createReadOnly(LedgerDescriptor.java:52)
 ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1]
           at 
org.apache.bookkeeper.bookie.HandleFactoryImpl.getReadOnlyHandle(HandleFactoryImpl.java:61)
 ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1]
   ```
   
   
   But the ReplicationWorker still continue to try to rereplicate this ledger, 
and keep failed. According to the following log, it throw 
BKNotEnoughBookiesException, therefore ReplicationWorker#run would keep 
running, keep replicate a can-not-replicated ledger. The result is generating 
too much recovery read request to the other 2 ensemble bookie, affect the 
normal read request.
   
   ```
   [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.client.PendingReadLacOp - While readLac ledger: 1294 did 
not hear success 
   responses from all of ensemble
   [ReplicationWorker] INFO  
org.apache.bookkeeper.replication.ReplicationWorker - BKReadException while 
rereplicating ledger 1294. Enough Bookies might not have available So, no harm 
to continue
   ```
   
   ```
   [BookieReadThreadPool-OrderedExecutor-0-0] ERROR 
org.apache.bookkeeper.proto.ReadLacProcessorV3 - IOException while trying to 
read last entry: 1294
   org.apache.bookkeeper.bookie.Bookie$NoEntryException: Entry -1 not found in 
1294
   ```
   
   
   The zkmetadata has ledger 1294 under /ledgers/underreplication/ledgers
   
   
![企业微信截图_40feaa39-c1d5-4ce1-ae37-6a7f1eadd339](https://user-images.githubusercontent.com/13505225/172157384-2bb225a8-c924-47d4-9d02-aa6f7046f4d7.png)
   
   
   ***To Reproduce***
   
   
   
   ***Expected behavior***
   
   Should it skip those deleted ledger when doing recovery ?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to