YCharanGowda opened a new pull request, #968:
URL: https://github.com/apache/tomcat/pull/968

   ## Problem
   In Apache Tomcat's clustering feature, the `ReplicationValve` handles 
sending session replication messages to cluster nodes. Previously, all send 
operations (invalid sessions, session replication, and cross-context sessions) 
were wrapped in a single `try-catch` block. If any send failed (e.g., due to 
network issues or node unavailability), the exception would propagate and skip 
all remaining sends, reducing cluster reliability and potentially causing data 
inconsistencies across nodes.
   
   This was noted in a FIXME comment: "we have a lot of sends, but the trouble 
with one node stops the correct replication to other nodes!"
   
   ## Solution
   - Split the single `try-catch` block in `sendReplicationMessage()` into 
individual `try-catch` blocks for each send operation (`sendInvalidSessions`, 
`sendSessionReplicationMessage`, and `sendCrossContextSession`).
   - This ensures that if one send fails, the others continue, improving fault 
tolerance without changing successful behavior.
   - Added specific error messages in `LocalStrings.properties` for better 
logging and diagnostics:
     - `ReplicationValve.send.replication.failure`
     - `ReplicationValve.send.crosscontext.failure`
   - Removed the FIXME comments since the issue is now addressed.
   
   ## Files Changed
   - `java/org/apache/catalina/ha/tcp/ReplicationValve.java`: Modified 
`sendReplicationMessage()` method
   - `java/org/apache/catalina/ha/tcp/LocalStrings.properties`: Added new error 
message keys
   
   ## Impact
   - **Positive**: Enhances robustness in high-availability setups where 
network failures are common.
   - **Risk**: Low – no functional changes for successful sends; only improves 
error handling.
   - **Backward Compatible**: Yes, no breaking changes.
   
   ## Testing
   - Verified compilation without errors
   - Changes are minimal and isolated to error handling paths
   - Recommend testing in a clustered environment to confirm sends continue on 
failure
   
   ## Related
   - Resolves FIXME in `ReplicationValve.java` (line 373)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to