qianye1001 commented on issue #10512: URL: https://github.com/apache/rocketmq/issues/10512#issuecomment-4704243241
## Root-cause of the 81K log lines: watermark gap is too narrow The issue correctly identifies the log noise, but I think it's worth calling out the **root cause** — the fix should go beyond a log-level downgrade. ### Why `channelWritabilityChanged` fires ~900 times/sec The current code (`NettyRemotingServer.java:579-593`) implements backpressure by toggling `autoRead` on writability changes: - `channel.bytesBeforeUnwritable() == 0` → set `autoRead(false)` (stop reading) - `channel.isWritable()` again → set `autoRead(true)` (resume reading) The default watermark values in `NettySystemConfig` are both **0**: ```java writeBufferHighWaterMark = Integer.parseInt(System.getProperty(..., "0")); writeBufferLowWaterMark = Integer.parseInt(System.getProperty(..., "0")); ``` Since `NettyRemotingServer.java:317` only sets custom watermarks when both are `> 0`, the broker falls back to **Netty's defaults: high = 64 KB, low = 32 KB**. Under high throughput, a 32 KB gap between high and low is far too narrow — a handful of messages can push the write buffer past 64 KB (→ unwritable), and the kernel draining a few KB brings it back below 32 KB (→ writable again). This causes rapid oscillation between the two states, which is the direct cause of the log storm. ### Suggested fix (in addition to the log downgrade) Increase the default watermark values so the backpressure mechanism doesn't thrash: ```java // e.g., 1 MB high / 512 KB low — gives 512 KB of hysteresis writeBufferHighWaterMark = Integer.parseInt(System.getProperty(..., "1048576")); writeBufferLowWaterMark = Integer.parseInt(System.getProperty(..., "524288")); ``` This is analogous to a thermostat with a 1-degree deadband vs. a 10-degree deadband — a wider gap prevents rapid on/off cycling. **The log downgrade (to DEBUG) is still a good idea** as a defense-in-depth measure, but fixing the watermark defaults addresses the actual problem: the backpressure mechanism shouldn't be oscillating hundreds of times per second in the first place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
