hangc0276 opened a new pull request, #3192:
URL: https://github.com/apache/bookkeeper/pull/3192

   ### Motivation
   When do compaction for an entry log file, it will scan each entry header 
metadata from beginning to the end of the file. When an entry doesn't deleted, 
it will read the entry data and write into a new entry log file. The throttle 
only works on the entry which doesn't deleted.
   
https://github.com/apache/bookkeeper/blob/b0030ab5d9132e5c5cd5092b8e6dab79e9fbf16c/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java#L1012-L1051
   
   For those deleted entries, the throttle strategy doesn't applied. However, 
we use `BufferedChannel` to read the entry log file which will prefetch data 
from file when the read buffer missed.
   
https://github.com/apache/bookkeeper/blob/b0030ab5d9132e5c5cd5092b8e6dab79e9fbf16c/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/BufferedReadChannel.java#L91-L97
   
   For an entry log file, more than 90% of entries has been deleted, the 
compactor will scan those entry's header metadata one by one. When reading one 
entry's metadata, it will missed in BufferedChannel read buffer, it will 
trigger prefetch from disk. For the following entries, the header metadata 
reading will also missed in BufferedChannel read buffer, and will continue to 
prefetch from disk without throttle. It will lead to ledger disk IO util runs 
high. 
   
   Moreover, for each prefetch operation from disk, it will also trigger OS 
PageCache prefetch. For the compaction model, the OS PageCache prefetch will 
lead to PageCache pollution,and maybe also affect the journal sync latency. For 
this one, we can use the Direct IO to reduce the PageCache effect. 
https://github.com/apache/bookkeeper/issues/2943
   
   ### Changes
   Add throttle on entry header metadata check stage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to