[ 
https://issues.apache.org/jira/browse/RATIS-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze resolved RATIS-2415.
-------------------------------
    Fix Version/s: 3.3.0
       Resolution: Fixed

The pull request is now merged.  Thanks, [~slfan1989]!

>  Fix queue corruption in NettyRpcProxy when request sending fails
> -----------------------------------------------------------------
>
>                 Key: RATIS-2415
>                 URL: https://issues.apache.org/jira/browse/RATIS-2415
>             Project: Ratis
>          Issue Type: Bug
>          Components: Netty
>            Reporter: Shilun Fan
>            Assignee: Shilun Fan
>            Priority: Major
>             Fix For: 3.3.0
>
>         Attachments: 1356_review.patch
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> *Summary*
> NettyRpcProxy.Connection.offer() has a bug where a CompletableFuture is 
> added to the replies queue before calling writeAndFlush(). If writeAndFlush() 
> throws an AlreadyClosedException (or fails asynchronously), the future 
> remains 
> in the queue, causing memory leaks and reply mismatches.
>  
> *Root Cause*
> {code:java}
> synchronized ChannelFuture offer(...) {
>     replies.offer(reply); // Step 1: enqueue
>     return client.writeAndFlush(request); // Step 2: may throw exception
> } {code}
> If Step 2 fails, Step 1 is not rolled back, leaving the queue corrupted.
> *Reproduction Senario*
> 1. Send request1 → success, queue=[future1], network=[request1]
> 2. Send request2 → writeAndFlush throws exception, queue=[future1,future2], 
> network=[request1]
> 3. Send request3 → success, queue=[future1,future2,future3], 
> network=[request1,request3]
> 4. Server returns response1, response3
> 5. Client receives response1 → pollReply() gets future1 ✅
> 6. Client receives response3 → pollReply() gets future2 ❌ (mismatch!)
> 7. future3 never completes (timeout)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to