lizhimins opened a new issue, #10520:
URL: https://github.com/apache/rocketmq/issues/10520

   ### Before Creating the Bug Report
   
   - [x] I found a bug, not just asking a question, which should be created in 
[GitHub Discussions](https://github.com/apache/rocketmq/discussions).
   - [x] I have searched the [GitHub 
Issues](https://github.com/apache/rocketmq/issues) and [GitHub 
Discussions](https://github.com/apache/rocketmq/discussions) of this repository 
and believe that this is not a duplicate.
   - [x] I have confirmed that this bug belongs to the current repository, not 
other repositories of RocketMQ.
   
   ### Runtime platform environment
   
   OS: Linux (NVMe SSD, cloud disks such as Alibaba Cloud ESSD)
   
   ### RocketMQ version
   
   branch: develop
   version: 5.3.x
   
   ### Describe the Bug
   
   `ConsumeQueue.correctMinOffset` performs binary search on mmap files (random 
access pattern). The Linux kernel default `read_ahead_kb` on NVMe devices is 
aggressively large, so each page fault during binary search pulls in far more 
data than actually needed, producing periodic disk read pulses.
   
   On cloud disks where read/write bandwidth share a single quota, these read 
pulses squeeze CommitLog writes and cause periodic send-RT spikes. In our 
production case, send p99 jumped from ~4ms to ~26ms every 8-9 minutes, with 
~244x read amplification.
   
   ### Steps to Reproduce
   
   1. Run a Broker with a large number of ConsumeQueue instances (e.g. 10000+) 
on NVMe storage
   2. Let disk usage approach the cleanup threshold so `correctMinOffset` runs 
frequently
   3. Observe periodic disk read pulses and send-RT spikes via `dstat` / 
`pidstat`
   
   ### What Did You Expect to See?
   
   `correctMinOffset` binary search should not cause excessive disk I/O or 
impact send latency.
   
   ### What Did You See Instead?
   
   Periodic disk read pulses (~975MB per cycle) and send p99 spikes (4ms -> 
26ms) every 8-9 minutes, correlated with `StoreCleanQueueScheduledThread` 
running `correctMinOffset`.
   
   ### Additional Context
   
   Root cause: `madvise(MADV_RANDOM)` is not applied before binary search, so 
the kernel read-ahead remains active for random access. 
`posix_fadvise(FADV_RANDOM)` does NOT work here because the mmap readahead path 
(`do_sync_mmap_readahead`) only checks `VM_RAND_READ` (set by `madvise`), not 
`FMODE_RANDOM` (set by `fadvise`) on Linux 2.6.35+.
   
   Fix: wrap the binary search with `madvise(MADV_RANDOM)` / 
`madvise(MADV_NORMAL)`, gated by a config switch 
`correctMinOffsetMadviseEnable` (default: off).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to