Shilun Fan created RATIS-2415:
---------------------------------
Summary: Fix queue corruption in NettyRpcProxy when request
sending fails
Key: RATIS-2415
URL: https://issues.apache.org/jira/browse/RATIS-2415
Project: Ratis
Issue Type: Bug
Reporter: Shilun Fan
Assignee: Shilun Fan
*Summary*
NettyRpcProxy.Connection.offer() has a bug where a CompletableFuture is
added to the replies queue before calling writeAndFlush(). If writeAndFlush()
throws an AlreadyClosedException (or fails asynchronously), the future remains
in the queue, causing memory leaks and reply mismatches.
*Root Cause*
{code:java}
synchronized ChannelFuture offer(...) {
replies.offer(reply); // Step 1: enqueue
return client.writeAndFlush(request); // Step 2: may throw exception
} {code}
If Step 2 fails, Step 1 is not rolled back, leaving the queue corrupted.
*Reproduction Senario*
1. Send request1 → success, queue=[future1], network=[request1]
2. Send request2 → writeAndFlush throws exception, queue=[future1,future2],
network=[request1]
3. Send request3 → success, queue=[future1,future2,future3],
network=[request1,request3]
4. Server returns response1, response3
5. Client receives response1 → pollReply() gets future1 ✅
6. Client receives response3 → pollReply() gets future2 ❌ (mismatch!)
7. future3 never completes (timeout)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)