Shilun Fan created RATIS-2415:
---------------------------------

             Summary:  Fix queue corruption in NettyRpcProxy when request 
sending fails
                 Key: RATIS-2415
                 URL: https://issues.apache.org/jira/browse/RATIS-2415
             Project: Ratis
          Issue Type: Bug
            Reporter: Shilun Fan
            Assignee: Shilun Fan


*Summary*
NettyRpcProxy.Connection.offer() has a bug where a CompletableFuture is 
added to the replies queue before calling writeAndFlush(). If writeAndFlush() 
throws an AlreadyClosedException (or fails asynchronously), the future remains 
in the queue, causing memory leaks and reply mismatches.
 
*Root Cause*
{code:java}
synchronized ChannelFuture offer(...) {
    replies.offer(reply); // Step 1: enqueue
    return client.writeAndFlush(request); // Step 2: may throw exception
} {code}
If Step 2 fails, Step 1 is not rolled back, leaving the queue corrupted.


*Reproduction Senario*
1. Send request1 → success, queue=[future1], network=[request1]
2. Send request2 → writeAndFlush throws exception, queue=[future1,future2], 
network=[request1]
3. Send request3 → success, queue=[future1,future2,future3], 
network=[request1,request3]
4. Server returns response1, response3
5. Client receives response1 → pollReply() gets future1 ✅
6. Client receives response3 → pollReply() gets future2 ❌ (mismatch!)
7. future3 never completes (timeout)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to