TakaHiro0208 opened a new issue, #3316:
URL: https://github.com/apache/bookkeeper/issues/3316
**BUG REPORT**
***Describe the bug***
Our Production pulsar cluster is multiple nodes with E-Qw-Qa(3-3-2),
enabling auto-recovery by "./bin/bookkeeper shell autorecovery -enable",
bookkeeper version is 4.14.1 . Now one bookie server is down, and cluster do
autoRecovery. However, there is a ledger can not read by the other 2 ensemble,
the error is both : Ledger 1294 not found (It seems the ledger has been deleted)
```
[BookieReadThreadPool-OrderedExecutor-0-0] ERROR
org.apache.bookkeeper.proto.ReadLacProcessorV3 - No ledger found while
performing readLac from ledger: 1294
org.apache.bookkeeper.bookie.Bookie$NoLedgerException: Ledger 1294 not found
at
org.apache.bookkeeper.bookie.LedgerDescriptor.createReadOnly(LedgerDescriptor.java:52)
~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1]
at
org.apache.bookkeeper.bookie.HandleFactoryImpl.getReadOnlyHandle(HandleFactoryImpl.java:61)
~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1]
```
But the ReplicationWorker still continue to try to rereplicate this ledger,
and keep failed. According to the following log, it throw
BKNotEnoughBookiesException, therefore ReplicationWorker#run would keep
running, keep replicate a can-not-replicated ledger. The result is generating
too much recovery read request to the other 2 ensemble bookie, affect the
normal read request.
```
[BookKeeperClientWorker-OrderedExecutor-0-0] INFO
org.apache.bookkeeper.client.PendingReadLacOp - While readLac ledger: 1294 did
not hear success
responses from all of ensemble
[ReplicationWorker] INFO
org.apache.bookkeeper.replication.ReplicationWorker - BKReadException while
rereplicating ledger 1294. Enough Bookies might not have available So, no harm
to continue
```
```
[BookieReadThreadPool-OrderedExecutor-0-0] ERROR
org.apache.bookkeeper.proto.ReadLacProcessorV3 - IOException while trying to
read last entry: 1294
org.apache.bookkeeper.bookie.Bookie$NoEntryException: Entry -1 not found in
1294
```
The zkmetadata has ledger 1294 under /ledgers/underreplication/ledgers

***To Reproduce***
***Expected behavior***
Should it skip those deleted ledger when doing recovery ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]