hangc0276 opened a new pull request, #3940: URL: https://github.com/apache/bookkeeper/pull/3940
### Motivation When the bookie triggers compaction, both Transaction compactor and EntryLogCompactor, the write throughput becomes jittery. <img width="853" alt="image" src="https://user-images.githubusercontent.com/5436568/235288456-482bc1c2-3511-48a3-a698-f70bf93c20dd.png"> <img width="847" alt="image" src="https://user-images.githubusercontent.com/5436568/235288468-d51fe344-ed10-4ef9-ad7f-62fa83577a7b.png"> <img width="843" alt="image" src="https://user-images.githubusercontent.com/5436568/235288476-e723b87a-e600-457e-af44-5121d3a4204b.png"> I take a deep look at the logs and found during compacting the entry log 491 in transaction compaction, there are Write Cache operations and roll new entry log files. ``` 2023-04-26T06:14:16,356+0000 [GarbageCollectorThread-6-1] INFO org.apache.bookkeeper.bookie.TransactionalEntryLogCompactor - Compacting entry log 491 with usage 8.796982355448682E-6. 2023-04-26T06:14:16,356+0000 [GarbageCollectorThread-6-1] INFO org.apache.bookkeeper.bookie.EntryLoggerAllocator - Created new entry log file /pulsar/data/bookkeeper/ledgers-0/current/5d3.log.compacting for logId 1491. 2023-04-26T06:14:17,198+0000 [bookie-io-8-2] INFO org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage - Write cache is full, triggering flush 2023-04-26T06:14:19,030+0000 [bookie-io-8-2] INFO org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage - Write cache is full, triggering flush 2023-04-26T06:14:19,806+0000 [db-storage-4-1] INFO org.apache.bookkeeper.bookie.EntryLogManagerBase - Creating a new entry log file : createNewLog = false, reachEntryLogLimit = true 2023-04-26T06:14:19,806+0000 [db-storage-4-1] INFO org.apache.bookkeeper.bookie.EntryLogManagerBase - Flushing entry logger 1488 back to filesystem, pending for syncing entry loggers : [BufferedChannel{logId=1488, logFile =/pulsar/data/bookkeeper/ledgers-0/current/5d0.log, ledgerIdAssigned=-1}]. 2023-04-26T06:14:19,807+0000 [pool-4-thread-1] INFO org.apache.bookkeeper.bookie.EntryLoggerAllocator - Created new entry log file /pulsar/data/bookkeeper/ledgers-0/current/5d4.log for logId 1492. 2023-04-26T06:14:19,968+0000 [db-storage-4-1] INFO org.apache.bookkeeper.bookie.EntryLogManagerForSingleEntryLog - Synced entry logger 1488 to disk. 2023-04-26T06:14:20,971+0000 [bookie-io-8-2] INFO org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage - Write cache is full, triggering flush 2023-04-26T06:14:22,999+0000 [bookie-io-8-2] INFO org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage - Write cache is full, triggering flush 2023-04-26T06:14:24,960+0000 [bookie-io-8-2] INFO org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage - Write cache is full, triggering flush 2023-04-26T06:14:27,109+0000 [bookie-io-8-1] INFO org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage - Write cache is full, triggering flush 2023-04-26T06:14:29,204+0000 [bookie-io-8-1] INFO org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage - Write cache is full, triggering flush 2023-04-26T06:14:32,569+0000 [bookie-io-8-1] INFO org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage - Write cache is full, triggering flush 2023-04-26T06:14:32,610+0000 [BookieJournal-3181] INFO org.apache.bookkeeper.bookie.JournalChannel - Opening journal /pulsar/data/bookkeeper/journal-0/current/187bb72bb20.txn 2023-04-26T06:14:35,383+0000 [GarbageCollectorThread-6-1] INFO org.apache.bookkeeper.bookie.DefaultEntryLogger - Flushed compaction log file /pulsar/data/bookkeeper/ledgers-0/current/5d3.log.compacting with logId 1491. 2023-04-26T06:14:36,265+0000 [GarbageCollectorThread-6-1] INFO org.apache.bookkeeper.bookie.EntryLogManagerBase - Creating a new entry log file : createNewLog = false, reachEntryLogLimit = true 2023-04-26T06:14:36,266+0000 [GarbageCollectorThread-6-1] INFO org.apache.bookkeeper.bookie.EntryLogManagerBase - Flushing entry logger 1490 back to filesystem, pending for syncing entry loggers : [BufferedChannel{logId=1490, logFile=/pulsar/data/bookkeeper/ledgers-0/current/5d2.log, ledgerIdAssigned=-1}]. 2023-04-26T06:14:36,266+0000 [pool-4-thread-1] INFO org.apache.bookkeeper.bookie.EntryLoggerAllocator - Created new entry log file /pulsar/data/bookkeeper/ledgers-0/current/5d5.log for logId 1493. 2023-04-26T06:14:36,362+0000 [GarbageCollectorThread-6-1] INFO org.apache.bookkeeper.bookie.EntryLogManagerForSingleEntryLog - Synced entry logger 1490 to disk. 2023-04-26T06:14:36,720+0000 [bookie-io-8-1] INFO org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage - Write cache is full, triggering flush 2023-04-26T06:14:36,970+0000 [GarbageCollectorThread-6-1] INFO org.apache.bookkeeper.bookie.Journal - garbage collected journal 187bb72bb1f.txn 2023-04-26T06:14:37,498+0000 [GarbageCollectorThread-6-1] INFO org.apache.bookkeeper.bookie.GarbageCollectorThread - Removing entry log metadata for 491 2023-04-26T06:14:37,498+0000 [GarbageCollectorThread-6-1] INFO org.apache.bookkeeper.bookie.TransactionalEntryLogCompactor - Compacted entry log : 491. ``` Let's take a look at the compaction steps first before analyzing the root cause. #### Transaction Compactor - Step1: Create a new entry log file with `compacting` suffix - Step2: Open the original entry log file and read the ledgers one by one, filter out those deleted ledgers, and write the remaining ledgers into the **newly created entry log file** - Step3: Flush the newly created entry log file - **Step4: Flush the current ledger storage (Trigger a new checkpoint)** https://github.com/apache/bookkeeper/blob/c765aea600fde1a8ac7cb8a53a1157a372026894/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/ldb/SingleDirectoryDbLedgerStorage.java#L947 - Step5: Update the entries' new index in the newly created entry log file into the RocksDB - Step6: Remove the `compacting` suffix - Step7: Delete the original entry log file and remove it from the entryLogMetaMap #### EntryLog Compactor - Step1: Open the original entry log file and read the ledgers one by one, filter out those deleted ledgers, and write the remaining ledgers into the **current writing entry log file** - Step2: Flush the current writing entry log file to ensure those written ledgers are flushed into the disk - **Step3: Flush the current ledger storage (Trigger a new checkpoint)** https://github.com/apache/bookkeeper/blob/c765aea600fde1a8ac7cb8a53a1157a372026894/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/ldb/SingleDirectoryDbLedgerStorage.java#L947 - Step4: Update the entries' new index in the current entry log file into the RocksDB - Step5: Delete the original entry log file and remove it from the entryLogMetaMap For Step4 in Transaction Compactor and Step3 in EntryLog Compactor, it triggers a new checkpoint and makes the write cache switch no matter the write cache store how much data. However, looking back at the whole transaction steps, ` Flush the current ledger storage` may want to ensure the remaining data has been flushed into the entry log file before updating the entres' index into RocksDB. But both `Transaction Compactor` and `EntryLog Compactor` are triggered flush operations to ensure those remaining data are flushed into the entry log file. So I think the `Flush the current ledger storage` step is unnecessary and it will bring more throughput impact. ### Changes Remove the flush operation in the `SingleDirectoryDbLedgerStorage#updateEntriesLocations` operation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
