P-Peaceful opened a new pull request, #3874:
URL: https://github.com/apache/hertzbeat/pull/3874

   Related to #3855 
   ## What's changed?
   
   <!-- Describe Your PR Here -->
   
   Fix inconsistency in Redis Cluster monitor regarding master-slave 
relationships.  
   
   When retrieving cluster nodes, we now use `StatefulRedisClusterConnection` 
to obtain the full node list. However, when needing to fetch accurate, per-node 
information (e.g., via the `INFO` command), we establish a dedicated 
`StatefulRedisConnection` directly to that specific node.  
   
   This change is necessary because Lettuce's cluster implementation routes 
keyless commands (like `INFO`) to the *default* connection—typically the first 
node specified in the `RedisURI`—rather than the intended target node. As a 
result, using only `StatefulRedisClusterConnection` could return misleading 
`INFO` output that doesn't reflect the actual state of the queried node.
   
   > Every request that includes at least one key is routed based on its hash 
slot to the corresponding node. Commands without a key are executed on the 
*default* connection, which most likely points to the first provided 
`RedisURI`.  
   > — [Lettuce Wiki: Command 
routing](https://github.com/redis/lettuce/wiki/Redis-Cluster)
   
   ## Debug Insight: Redis Cluster Connection Mismatch
   During debugging, we observed a critical inconsistency in the Redis Cluster 
connection handling:
   
   The `redisUri` field (highlighted in red) indicates that the intended target 
node is redis://192.168.0.220:7004?timeout=3s.
   However, the actual underlying Netty channel (also highlighted) shows that 
the connection was established to 192.168.0.220:7001 — a different node in the 
cluster.
   This mismatch confirms our hypothesis: When using 
`StatefulRedisClusterConnection#doWrite()` to execute keyless commands like 
INFO, Lettuce routes the command to the default node (often the first URI 
provided), rather than the node originally targeted.
   
   As a result:
   The monitor incorrectly reports the state of node 7001 instead of 7004.
   This leads to inaccurate master-slave relationship detection and potentially 
misleading cluster topology information.
   <img width="1161" height="474" alt="image" 
src="https://github.com/user-attachments/assets/542e0c39-17a9-4e72-a705-55d0cf4d0919";
 />
   
   ## After
   The other cluster metrics are also correctly fetched for the specified node
   <img width="679" height="316" alt="image" 
src="https://github.com/user-attachments/assets/3435600a-10e6-4573-986c-91693a5204d3";
 />
   <img width="1136" height="129" alt="image" 
src="https://github.com/user-attachments/assets/9f051fc8-0cdd-4c4d-99cd-2dc9c4825d75";
 />
   
   
   ## Checklist
   
   - [x]  I have read the [Contributing 
Guide](https://hertzbeat.apache.org/docs/community/code_style_and_quality_guide)
   - [ ]  I have written the necessary doc or comment.
   - [ ]  I have added the necessary unit tests and all cases have passed.
   
   ## Add or update API
   
   - [ ] I have added the necessary [e2e 
tests](https://github.com/apache/hertzbeat/tree/master/e2e) and all cases have 
passed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to