3424672656 opened a new issue, #10165:
URL: https://github.com/apache/rocketmq/issues/10165

   ### Before Creating the Bug Report
   
   - [x] I found a bug, not just asking a question, which should be created in 
[GitHub Discussions](https://github.com/apache/rocketmq/discussions).
   
   - [x] I have searched the [GitHub 
Issues](https://github.com/apache/rocketmq/issues) and [GitHub 
Discussions](https://github.com/apache/rocketmq/discussions)  of this 
repository and believe that this is not a duplicate.
   
   - [x] I have confirmed that this bug belongs to the current repository, not 
other repositories of RocketMQ.
   
   
   ### Runtime platform environment
   
   ubuntu
   
   ### RocketMQ version
   
   develop
   
   ### JDK Version
   
   1.8
   
   ### Describe the Bug
   
   ## Motivation
   
   When switching from file-based timer engine to RocksDB timer engine via 
`switchTimerEngine`, the 
   `checkAndReviseMetrics` scheduled task in `TimerMessageStore` continues to 
execute without any engine 
   switch guard. This causes RocksDB-side timer metrics to be incorrectly 
overwritten.
   
   ## Root Cause
   
   1. **Shared TimerMetrics**: Both `TimerMessageStore` (file-based) and 
`TimerMessageRocksDBStore` (RocksDB) 
      share the same `TimerMetrics` object.
   
   2. **No switch guard in scheduler**: The `checkAndReviseMetrics` scheduled 
task registered in 
      `TimerMessageStore.start()` has no check for `timerStopEnqueue` or 
`timerRocksDBEnable`. After 
      `switchTimerEngine(ROCKSDB_TIMELINE)` sets `timerStopEnqueue=true`, the 
scheduler still fires.
   
   3. **Overwrite via putAll**: `checkAndReviseMetrics()` only traverses 
`timerLog` (file-based data) to 
      rebuild metric counts for "small" topics, then calls 
`timerMetrics.getTimingCount().putAll(newSmallOnes)`. 
      Since RocksDB-side data is not in `timerLog`, any topic with metrics from 
RocksDB gets overwritten to 0 
      (or loses the RocksDB portion for shared topics).
   
   ### Timeline
   
   
   
   ### Steps to Reproduce
   
   
   ## Fix
   
   Add a `storeConfig.isTimerStopEnqueue()` guard in the 
`checkAndReviseMetrics` scheduled task. When the 
   file-based engine has stopped enqueuing (indicating a switch to RocksDB), 
skip `checkAndReviseMetrics` 
   to prevent overwriting RocksDB-side metrics.
   
   **Why `timerStopEnqueue`?**
   - `switchTimerEngine` always sets `timerStopEnqueue=true` when switching to 
RocksDB
   - When switching back to file-based, it sets `timerStopEnqueue=false`, so 
`checkAndReviseMetrics` resumes
   - The semantics are precise: "file-based engine has stopped, should not 
revise file-based metrics"
   - Minimal change, no new config flags needed
   
   ## Changes
   
   ### 
`store/src/main/java/org/apache/rocketmq/store/timer/TimerMessageStore.java`
   
   Added `timerStopEnqueue` check in the scheduler task before calling 
`checkAndReviseMetrics()`:
   
   
   
   ### What Did You Expect to See?
   
   After switching the engine, the indicators returned to normal.
   
   ### What Did You See Instead?
   
   null
   
   ### Additional Context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to