horizonzy opened a new issue, #4040:
URL: https://github.com/apache/bookkeeper/issues/4040
### Broker
Last week, in our testing environment, we found that the broker reported an
error when initializing a certain topic. The log is as follows.
```
2023-07-14T10:27:03,622+0000 [bookkeeper-ml-scheduler-OrderedScheduler-0-0]
INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed
ledger sn/system/persistent/__sla
2023-07-14T10:27:03,630+0000 [BookKeeperClientWorker-OrderedExecutor-0-0]
ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Read for failed on
bookie v3-0-o-bk-2.v3-0-o-bk-headless.chaos-platform.svc.cluster.local:3181
code EIO
2023-07-14T10:27:03,705+0000 [BookKeeperClientWorker-OrderedExecutor-0-0]
ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Read for failed on
bookie v3-0-o-bk-1.v3-0-o-bk-headless.chaos-platform.svc.cluster.local:3181
code EIO
2023-07-14T10:27:03,705+0000 [BookKeeperClientWorker-OrderedExecutor-0-0]
ERROR org.apache.bookkeeper.client.ReadLastConfirmedOp - While
readLastConfirmed ledger: 2298628 did not hear success responses from all
quorums, QuorumCoverage(e:2,w:2,a:2) = [-1, -1]
2023-07-14T10:27:03,705+0000 [BookKeeperClientWorker-OrderedExecutor-0-0]
ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl -
[sn/system/persistent/__sla] Failed to open ledger 2298628: Error while
recovering ledger
2023-07-14T10:27:03,705+0000 [BookKeeperClientWorker-OrderedExecutor-0-0]
INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl -
[sn/system/persistent/__sla] Closing managed ledger
2023-07-14T10:27:03,705+0000 [BookKeeperClientWorker-OrderedExecutor-0-0]
WARN org.apache.pulsar.broker.service.BrokerService - Failed to create topic
persistent://sn/system/__sla
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering
ledger error code: -10
```
It shows that the ledger `2298628` is the last ledger of topic
`sn/system/persistent/__sla`, when the broker load topic
`sn/system/persistent/__sla`, it will try to open the last ledger `2298628`. In
the open process, it checks the ledger state, and found that the state is
`Open`, then triggers ledger recovery, sends `ReadLastConfirmed` to each bookie
to confirm the LAC of ledger, but the bookie server handles the request failed,
the ledger `2298628` open failed.
### Bookie
#### 1. Use Arthas to trace the bk1, we found that when Bookie read ledger
2298628 LAC, it throws the exception as following.
```
at
org.apache.bookkeeper.bookie.DefaultEntryLogger.internalReadEntry(DefaultEntryLogger.java:840)
at
org.apache.bookkeeper.bookie.DefaultEntryLogger.readEntry(DefaultEntryLogger.java:814)
at
org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.getLastEntry(SingleDirectoryDbLedgerStorage.java:753)
at
org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.doGetEntry(SingleDirectoryDbLedgerStorage.java:570)
at
org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.getEntry(SingleDirectoryDbLedgerStorage.java:555)
at
org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.getEntry(DbLedgerStorage.java:356)
at
org.apache.bookkeeper.bookie.LedgerDescriptorImpl.readEntry(LedgerDescriptorImpl.java:161)
at
org.apache.bookkeeper.bookie.BookieImpl.readEntry(BookieImpl.java:1117)
at
org.apache.bookkeeper.proto.ReadEntryProcessor.processPacket(ReadEntryProcessor.java:87)
at
org.apache.bookkeeper.proto.PacketProcessorBase.run(PacketProcessorBase.java:200)
at
org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
at
org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by:
org.apache.bookkeeper.bookie.DefaultEntryLogger$EntryLookupException$MissingLogFileException:
Missing entryLog 2574 for ledgerId 2298628, entry 103 at offset 1396307
at
org.apache.bookkeeper.bookie.DefaultEntryLogger.getFCForEntryInternal(DefaultEntryLogger.java:760)
at
org.apache.bookkeeper.bookie.DefaultEntryLogger.internalReadEntry(DefaultEntryLogger.java:832)
... 13 more
```
It shows that the ledger corresponding entry log file `2574` not exists.
##### 2. Search `2574` in the bk1, we found that the `2574` have been
removed due to compaction.
```
2023-06-13T23:59:26,169+0000 [GarbageCollectorThread-6-1] INFO
org.apache.bookkeeper.bookie.TransactionalEntryLogCompactor - Compacted entry
log : 2574.
2023-06-13T23:59:26,169+0000 [GarbageCollectorThread-6-1] INFO
org.apache.bookkeeper.bookie.GarbageCollectorThread - Removing entry log
metadata for 2574
```
In theory, after 2574 is compressed into the new entry log, the location of
ledger 2298628 should be mapped to the new entry log. However, the end result
is that the location of ledger 2298628 still points to 2574.
And then we discovered there is such a strange log.
```
2023-06-13T23:59:25,165+0000 [GarbageCollectorThread-6-1] WARN
org.apache.bookkeeper.bookie.DefaultEntryLogger - Short read for entry size
from entrylog 2574
```
This log was printed during the scan of the entry log, indicating that there
was a problem with the content of the entry log file, and the scan was
terminated in advance. Combining with the compaction of entryLog 2574, we can
analyze that the scan did not reach ledger 2298628 and ended prematurely due to
the issue with the entry log file content, resulting in the data of ledger
2298628 not being transferred to the new entry log file.
Therefore, the problem becomes why there was an issue with the content of
entryLog 2574?
#### 3. The DefaultEntryLog lastId mechanism.
In DefaultEntryLog, every time a new entryLog is created, its ID will be
written to the file "lastId". After the bookie is restarted, the "lastId" file
is read and its value is assigned to the "lastId" variable. The next time a new
entryLog is created, "lastId + 1" is used as the ID for the new entryLog. This
way, after a bookie is restarted, the IDs of the entryLogs are continuous and
won't conflict with each other.
#### 4. The DirectEntryLog lastId mechanism.
In DirectEntryLog, the way to obtain "lastId" is different. It will get all
the logs under the current ledger folder, parse the log IDs, and then select
the ID with the largest gap as the "lastId". The next time a new entryLog is
created, "lastId + 1" is used as the ID for the new entryLog. Under this
mechanism, "lastId" file is not read, and the ID is not written back to the
"lastId" file when a new file is created.
#### 5. DefaultEntryLog -> DirectEntryLog
When DefaultEntryLog is switched to DirectEntryLog, the ID of the new
entryLog created by DirectEntryLog is continuous with the ID in the "lastId"
file of DefaultEntryLog. However, since DirectEntryLog does not write back to
the "lastId" file when creating a new entryLog, we may see that the ID of the
latest entryLog in the ledger folder is much larger than the ID in the "lastId"
file.


##### 6. DefaultEntryLog -> DirectEntryLog -> DefaultEntryLog
When DirectEntryLog is switched back to DefaultEntryLog, DefaultEntryLog
reads the "lastId" file and adds 1 to it as the ID for the new entryLog.
However, this ID is much smaller than the ID of the current latest entryLog.
When DefaultEntryLog writes an entry, it opens the entryLog that DirectEntryLog
has already written data to and overwrites the data. As a result, the content
of the entryLog file is corrupted.
```
2023-06-11T12:59:06,748+0000 [GarbageCollectorThread-8-1] INFO
org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta
from entryLogId: 2574
2023-06-12T19:40:08,420+0000 [GarbageCollectorThread-6-1] INFO
org.apache.bookkeeper.bookie.EntryLoggerAllocator - Created new entry log file
/pulsar/data/bookkeeper/ledgers-0/current/a0e.log.compacting for logId 2574.
```
The logs show that the entryLog `2574` was scanned at
2023-06-11T12:59:06,748. After one day, 2023-06-12T19:40:08,420, the `entryLog`
2574 is created again. Then, write the data into entryLog `2574` again.
## Influence
#### The original data may lose.
Example:
Ledger 0 has 100 entries; the entry 0-100 has been written into entryLog
`0`, each entry index of Ledger 0 points to entryLog 0.
After a while, there is a new ledger 1, which also has 100 entries, and also
write the entries into entryLog `0`, each entry index of Ledger 1 also points
to entryLog 0.
When reading an entry from Ledger 0, if an entry from Ledger 1 overwrites
the location of the entry, an exception will be thrown. If the entry location
is not overwritten by an entry from Ledger 1, the read operation will be
successful.
And you will see the warning logs like as followings:
```
2023-06-21T17:11:47,300+0000 [BookieReadThreadPool-OrderedExecutor-0-0] WARN
org.apache.bookkeeper.bookie.DefaultEntryLogger - Sanity check failed for
entry size of 991045697 at location 47588603 in 2574
```
```
2023-06-23T19:47:40,726+0000 [GarbageCollectorThread-6-1] WARN
org.apache.bookkeeper.bookie.DefaultEntryLogger - Short read for entry size
from entrylog 2574
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]