P-Peaceful opened a new pull request, #3874: URL: https://github.com/apache/hertzbeat/pull/3874
Related to #3855 ## What's changed? <!-- Describe Your PR Here --> Fix inconsistency in Redis Cluster monitor regarding master-slave relationships. When retrieving cluster nodes, we now use `StatefulRedisClusterConnection` to obtain the full node list. However, when needing to fetch accurate, per-node information (e.g., via the `INFO` command), we establish a dedicated `StatefulRedisConnection` directly to that specific node. This change is necessary because Lettuce's cluster implementation routes keyless commands (like `INFO`) to the *default* connection—typically the first node specified in the `RedisURI`—rather than the intended target node. As a result, using only `StatefulRedisClusterConnection` could return misleading `INFO` output that doesn't reflect the actual state of the queried node. > Every request that includes at least one key is routed based on its hash slot to the corresponding node. Commands without a key are executed on the *default* connection, which most likely points to the first provided `RedisURI`. > — [Lettuce Wiki: Command routing](https://github.com/redis/lettuce/wiki/Redis-Cluster) ## Debug Insight: Redis Cluster Connection Mismatch During debugging, we observed a critical inconsistency in the Redis Cluster connection handling: The `redisUri` field (highlighted in red) indicates that the intended target node is redis://192.168.0.220:7004?timeout=3s. However, the actual underlying Netty channel (also highlighted) shows that the connection was established to 192.168.0.220:7001 — a different node in the cluster. This mismatch confirms our hypothesis: When using `StatefulRedisClusterConnection#doWrite()` to execute keyless commands like INFO, Lettuce routes the command to the default node (often the first URI provided), rather than the node originally targeted. As a result: The monitor incorrectly reports the state of node 7001 instead of 7004. This leads to inaccurate master-slave relationship detection and potentially misleading cluster topology information. <img width="1161" height="474" alt="image" src="https://github.com/user-attachments/assets/542e0c39-17a9-4e72-a705-55d0cf4d0919" /> ## After The other cluster metrics are also correctly fetched for the specified node <img width="679" height="316" alt="image" src="https://github.com/user-attachments/assets/3435600a-10e6-4573-986c-91693a5204d3" /> <img width="1136" height="129" alt="image" src="https://github.com/user-attachments/assets/9f051fc8-0cdd-4c4d-99cd-2dc9c4825d75" /> ## Checklist - [x] I have read the [Contributing Guide](https://hertzbeat.apache.org/docs/community/code_style_and_quality_guide) - [ ] I have written the necessary doc or comment. - [ ] I have added the necessary unit tests and all cases have passed. ## Add or update API - [ ] I have added the necessary [e2e tests](https://github.com/apache/hertzbeat/tree/master/e2e) and all cases have passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
