sandeep-mst opened a new issue, #25201:
URL: https://github.com/apache/pulsar/issues/25201

   ### Search before reporting
   
   - [x] I searched in the [issues](https://github.com/apache/pulsar/issues) 
and found nothing similar.
   
   
   ### Read release policy
   
   - [x] I understand that [unsupported 
versions](https://pulsar.apache.org/contribute/release-policy/#supported-versions)
 don't get bug fixes. I will attempt to reproduce the issue on a supported 
version of Pulsar client and Pulsar broker.
   
   
   ### User environment
   
   - master
   
   ### Issue Description
   
   There is a reentrancy bug in the Pulsar producer send path where 
`pendingMessages.clear()` can be executed after a retry message has already 
been added to `pendingMessages`. This results in the retry send’s 
CompletableFuture never being completed.
   
   This can occur when a retry `sendAsync` is triggered synchronously from 
within a handleSync callback of a failed send, while holding the producer mutex.
   
   This happens in the 
[failPendingMessages](https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java#L2315)
 method which usually runs on the timer thread.
   As the 
[pendingMessages.clear()](https://github.com/apache/pulsar/blob/d630394cdd02792b2dbc3a55443637a5d593a137/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java#L2327)
 is after the completeExceptionally, the retry logic as the code below will add 
the retryMessage to pendingMessages first and then the clear is called.
   
   ```
   CompletableFuture<MessageId> firstSend = producer.sendAsync(message);
   
   CompletableFuture<MessageId> retrySend =
                   firstSend.handleAsync((msgId, ex) -> {
                       assertNotNull(ex, "First send must timeout");
                       assertTrue(ex instanceof 
PulsarClientException.TimeoutException);
                       return producer.sendAsync(retryMessage);
                   }).thenCompose(f -> f);
   ```
   
   ### Error messages
   
   ```text
   
   ```
   
   ### Reproducing the issue
   
   Set a low timeout value and use synchronous retries as given in the above 
example.
   
   ### Additional information
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to