qianye1001 commented on issue #10512:
URL: https://github.com/apache/rocketmq/issues/10512#issuecomment-4704243241

   ## Root-cause of the 81K log lines: watermark gap is too narrow
   
   The issue correctly identifies the log noise, but I think it's worth calling 
out the **root cause** — the fix should go beyond a log-level downgrade.
   
   ### Why `channelWritabilityChanged` fires ~900 times/sec
   
   The current code (`NettyRemotingServer.java:579-593`) implements 
backpressure by toggling `autoRead` on writability changes:
   
   - `channel.bytesBeforeUnwritable() == 0` → set `autoRead(false)` (stop 
reading)
   - `channel.isWritable()` again → set `autoRead(true)` (resume reading)
   
   The default watermark values in `NettySystemConfig` are both **0**:
   
   ```java
   writeBufferHighWaterMark = Integer.parseInt(System.getProperty(..., "0"));
   writeBufferLowWaterMark  = Integer.parseInt(System.getProperty(..., "0"));
   ```
   
   Since `NettyRemotingServer.java:317` only sets custom watermarks when both 
are `> 0`, the broker falls back to **Netty's defaults: high = 64 KB, low = 32 
KB**.
   
   Under high throughput, a 32 KB gap between high and low is far too narrow — 
a handful of messages can push the write buffer past 64 KB (→ unwritable), and 
the kernel draining a few KB brings it back below 32 KB (→ writable again). 
This causes rapid oscillation between the two states, which is the direct cause 
of the log storm.
   
   ### Suggested fix (in addition to the log downgrade)
   
   Increase the default watermark values so the backpressure mechanism doesn't 
thrash:
   
   ```java
   // e.g., 1 MB high / 512 KB low — gives 512 KB of hysteresis
   writeBufferHighWaterMark = Integer.parseInt(System.getProperty(..., 
"1048576"));
   writeBufferLowWaterMark  = Integer.parseInt(System.getProperty(..., 
"524288"));
   ```
   
   This is analogous to a thermostat with a 1-degree deadband vs. a 10-degree 
deadband — a wider gap prevents rapid on/off cycling.
   
   **The log downgrade (to DEBUG) is still a good idea** as a defense-in-depth 
measure, but fixing the watermark defaults addresses the actual problem: the 
backpressure mechanism shouldn't be oscillating hundreds of times per second in 
the first place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to