hangc0276 opened a new pull request, #3192: URL: https://github.com/apache/bookkeeper/pull/3192
### Motivation When do compaction for an entry log file, it will scan each entry header metadata from beginning to the end of the file. When an entry doesn't deleted, it will read the entry data and write into a new entry log file. The throttle only works on the entry which doesn't deleted. https://github.com/apache/bookkeeper/blob/b0030ab5d9132e5c5cd5092b8e6dab79e9fbf16c/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java#L1012-L1051 For those deleted entries, the throttle strategy doesn't applied. However, we use `BufferedChannel` to read the entry log file which will prefetch data from file when the read buffer missed. https://github.com/apache/bookkeeper/blob/b0030ab5d9132e5c5cd5092b8e6dab79e9fbf16c/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/BufferedReadChannel.java#L91-L97 For an entry log file, more than 90% of entries has been deleted, the compactor will scan those entry's header metadata one by one. When reading one entry's metadata, it will missed in BufferedChannel read buffer, it will trigger prefetch from disk. For the following entries, the header metadata reading will also missed in BufferedChannel read buffer, and will continue to prefetch from disk without throttle. It will lead to ledger disk IO util runs high. Moreover, for each prefetch operation from disk, it will also trigger OS PageCache prefetch. For the compaction model, the OS PageCache prefetch will lead to PageCache pollution,and maybe also affect the journal sync latency. For this one, we can use the Direct IO to reduce the PageCache effect. https://github.com/apache/bookkeeper/issues/2943 ### Changes Add throttle on entry header metadata check stage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
