horizonzy opened a new issue, #4040:
URL: https://github.com/apache/bookkeeper/issues/4040

   ### Broker
   Last week, in our testing environment, we found that the broker reported an 
error when initializing a certain topic. The log is as follows.
   
   ```
   2023-07-14T10:27:03,622+0000 [bookkeeper-ml-scheduler-OrderedScheduler-0-0] 
INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed 
ledger sn/system/persistent/__sla
   2023-07-14T10:27:03,630+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] 
ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Read for failed on 
bookie v3-0-o-bk-2.v3-0-o-bk-headless.chaos-platform.svc.cluster.local:3181 
code EIO
   2023-07-14T10:27:03,705+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] 
ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Read for failed on 
bookie v3-0-o-bk-1.v3-0-o-bk-headless.chaos-platform.svc.cluster.local:3181 
code EIO
   2023-07-14T10:27:03,705+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] 
ERROR org.apache.bookkeeper.client.ReadLastConfirmedOp - While 
readLastConfirmed ledger: 2298628 did not hear success responses from all 
quorums, QuorumCoverage(e:2,w:2,a:2) = [-1, -1]
   2023-07-14T10:27:03,705+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] 
ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[sn/system/persistent/__sla] Failed to open ledger 2298628: Error while 
recovering ledger
   2023-07-14T10:27:03,705+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] 
INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[sn/system/persistent/__sla] Closing managed ledger
   2023-07-14T10:27:03,705+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] 
WARN  org.apache.pulsar.broker.service.BrokerService - Failed to create topic 
persistent://sn/system/__sla
   org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering 
ledger error code: -10
   ```
   It shows that the ledger `2298628` is the last ledger of topic 
`sn/system/persistent/__sla`, when the broker load topic 
`sn/system/persistent/__sla`, it will try to open the last ledger `2298628`. In 
the open process, it checks the ledger state, and found that the state is 
`Open`, then triggers ledger recovery, sends `ReadLastConfirmed` to each bookie 
to confirm the LAC of ledger, but the bookie server handles the request failed, 
the ledger `2298628` open failed.
   
   
   ### Bookie
   
   #### 1. Use Arthas to trace the bk1, we found that when Bookie read ledger 
2298628 LAC, it throws the exception as following.
   ```
        at 
org.apache.bookkeeper.bookie.DefaultEntryLogger.internalReadEntry(DefaultEntryLogger.java:840)
        at 
org.apache.bookkeeper.bookie.DefaultEntryLogger.readEntry(DefaultEntryLogger.java:814)
        at 
org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.getLastEntry(SingleDirectoryDbLedgerStorage.java:753)
        at 
org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.doGetEntry(SingleDirectoryDbLedgerStorage.java:570)
        at 
org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.getEntry(SingleDirectoryDbLedgerStorage.java:555)
        at 
org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.getEntry(DbLedgerStorage.java:356)
        at 
org.apache.bookkeeper.bookie.LedgerDescriptorImpl.readEntry(LedgerDescriptorImpl.java:161)
        at 
org.apache.bookkeeper.bookie.BookieImpl.readEntry(BookieImpl.java:1117)
        at 
org.apache.bookkeeper.proto.ReadEntryProcessor.processPacket(ReadEntryProcessor.java:87)
        at 
org.apache.bookkeeper.proto.PacketProcessorBase.run(PacketProcessorBase.java:200)
        at 
org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
        at 
org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:833)
   Caused by: 
org.apache.bookkeeper.bookie.DefaultEntryLogger$EntryLookupException$MissingLogFileException:
 Missing entryLog 2574 for ledgerId 2298628, entry 103 at offset 1396307
        at 
org.apache.bookkeeper.bookie.DefaultEntryLogger.getFCForEntryInternal(DefaultEntryLogger.java:760)
        at 
org.apache.bookkeeper.bookie.DefaultEntryLogger.internalReadEntry(DefaultEntryLogger.java:832)
        ... 13 more
   ```
   
   It shows that the ledger corresponding entry log file `2574` not exists.
   
   
   ##### 2. Search `2574` in the bk1, we found that the `2574` have been 
removed due to compaction.
   ```
   2023-06-13T23:59:26,169+0000 [GarbageCollectorThread-6-1] INFO  
org.apache.bookkeeper.bookie.TransactionalEntryLogCompactor - Compacted entry 
log : 2574.
   
   2023-06-13T23:59:26,169+0000 [GarbageCollectorThread-6-1] INFO  
org.apache.bookkeeper.bookie.GarbageCollectorThread - Removing entry log 
metadata for 2574
   ```
   
   
   In theory, after 2574 is compressed into the new entry log, the location of 
ledger 2298628 should be mapped to the new entry log. However, the end result 
is that the location of ledger 2298628 still points to 2574.
   
   
   And then we discovered there is such a strange log.
   
   
   ```
   2023-06-13T23:59:25,165+0000 [GarbageCollectorThread-6-1] WARN  
org.apache.bookkeeper.bookie.DefaultEntryLogger - Short read for entry size 
from entrylog 2574       
   ```
   
   This log was printed during the scan of the entry log, indicating that there 
was a problem with the content of the entry log file, and the scan was 
terminated in advance. Combining with the compaction of entryLog 2574, we can 
analyze that the scan did not reach ledger 2298628 and ended prematurely due to 
the issue with the entry log file content, resulting in the data of ledger 
2298628 not being transferred to the new entry log file.
   
   
   Therefore, the problem becomes why there was an issue with the content of 
entryLog 2574?
   
   
   #### 3. The DefaultEntryLog lastId mechanism.
   In DefaultEntryLog, every time a new entryLog is created, its ID will be 
written to the file "lastId". After the bookie is restarted, the "lastId" file 
is read and its value is assigned to the "lastId" variable. The next time a new 
entryLog is created, "lastId + 1" is used as the ID for the new entryLog. This 
way, after a bookie is restarted, the IDs of the entryLogs are continuous and 
won't conflict with each other.
   
   
   #### 4. The DirectEntryLog lastId mechanism.
   In DirectEntryLog, the way to obtain "lastId" is different. It will get all 
the logs under the current ledger folder, parse the log IDs, and then select 
the ID with the largest gap as the "lastId". The next time a new entryLog is 
created, "lastId + 1" is used as the ID for the new entryLog. Under this 
mechanism, "lastId" file is not read, and the ID is not written back to the 
"lastId" file when a new file is created.
   
   
   #### 5. DefaultEntryLog -> DirectEntryLog
   When DefaultEntryLog is switched to DirectEntryLog, the ID of the new 
entryLog created by DirectEntryLog is continuous with the ID in the "lastId" 
file of DefaultEntryLog. However, since DirectEntryLog does not write back to 
the "lastId" file when creating a new entryLog, we may see that the ID of the 
latest entryLog in the ledger folder is much larger than the ID in the "lastId" 
file.
   
   
![image](https://github.com/streamnative/eng-support-tickets/assets/22524871/3c997c2c-6bf7-4fa9-bb15-cb213dd8c8eb)
   
   ![image 
(1)](https://github.com/streamnative/eng-support-tickets/assets/22524871/c19cf8ff-f892-4d3c-959b-9fe1dc71ea51)
   
   ##### 6. DefaultEntryLog -> DirectEntryLog -> DefaultEntryLog
   When DirectEntryLog is switched back to DefaultEntryLog, DefaultEntryLog 
reads the "lastId" file and adds 1 to it as the ID for the new entryLog. 
However, this ID is much smaller than the ID of the current latest entryLog. 
When DefaultEntryLog writes an entry, it opens the entryLog that DirectEntryLog 
has already written data to and overwrites the data. As a result, the content 
of the entryLog file is corrupted.
   
   ```
   2023-06-11T12:59:06,748+0000 [GarbageCollectorThread-8-1] INFO  
org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta 
from entryLogId: 2574
   
   2023-06-12T19:40:08,420+0000 [GarbageCollectorThread-6-1] INFO  
org.apache.bookkeeper.bookie.EntryLoggerAllocator - Created new entry log file 
/pulsar/data/bookkeeper/ledgers-0/current/a0e.log.compacting for logId 2574.
   ```
   
   The logs show that the entryLog `2574` was scanned at 
2023-06-11T12:59:06,748. After one day, 2023-06-12T19:40:08,420, the `entryLog` 
2574 is created again. Then, write the data into entryLog `2574` again.
   
   
   ## Influence
   
   #### The original data may lose.
   Example: 
   Ledger 0 has 100 entries; the entry 0-100 has been written into entryLog 
`0`, each entry index of Ledger 0 points to entryLog 0.
   
   After a while, there is a new ledger 1, which also has 100 entries, and also 
write the entries into entryLog `0`, each entry index of Ledger 1 also points 
to entryLog 0.
   
   When reading an entry from Ledger 0, if an entry from Ledger 1 overwrites 
the location of the entry, an exception will be thrown. If the entry location 
is not overwritten by an entry from Ledger 1, the read operation will be 
successful.
   
   And you will see the warning logs like as followings:
   ```
   2023-06-21T17:11:47,300+0000 [BookieReadThreadPool-OrderedExecutor-0-0] WARN 
 org.apache.bookkeeper.bookie.DefaultEntryLogger - Sanity check failed for 
entry size of 991045697 at location 47588603 in 2574
   ```
   
   ```
   2023-06-23T19:47:40,726+0000 [GarbageCollectorThread-6-1] WARN  
org.apache.bookkeeper.bookie.DefaultEntryLogger - Short read for entry size 
from entrylog 2574
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to