danpi opened a new issue, #23765:
URL: https://github.com/apache/pulsar/issues/23765

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) 
and found nothing similar.
   
   
   ### Read release policy
   
   - [X] I understand that unsupported versions don't get bug fixes. I will 
attempt to reproduce the issue on a supported version of Pulsar client and 
Pulsar broker.
   
   
   ### Version
   
   OS:centos7
   Jdk:17
   Pulsar version:3.0.7
   
   ### Minimal reproduce step
   
   1. Add a test case to the testGetMessageById method in 
PersistentTopicsTest.java.
   
   2. Specifically, you can add the following code:
   `Assert.expectThrows(PulsarAdminException.ServerSideErrorException.class, () 
-> {
       admin.topics().getMessageById(topicName1, id1.getLedgerId(), 
id1.getEntryId() + 10);
   });
   `
   
   3. Run this test case to reproduce the issue. You will encounter the 
following error:
   `Caused by: 
org.apache.pulsar.client.admin.PulsarAdminException$TimeoutException: 
java.util.concurrent.TimeoutException
       at 
org.apache.pulsar.client.admin.internal.BaseResource.sync(BaseResource.java:347)
       at 
org.apache.pulsar.client.admin.internal.TopicsImpl.getMessageById(TopicsImpl.java:1010)
       at 
org.apache.pulsar.broker.admin.PersistentTopicsTest.lambda$testGetMessageById$11(PersistentTopicsTest.java:1385)
       at org.testng.Assert.expectThrows(Assert.java:2440)
       ... 29 more
   `
   
   ### What did you expect to see?
   
   The issue occurs when trying to query a non-existent message, which usually 
happens when a topic is newly created but hasn't received any traffic yet. In 
such cases, querying some information about the topic might invoke this API, 
leading to a timeout.
   
   For this scenario, I would expect a fast failure, rather than being blocked 
until the timeout occurs.
   
   ### What did you see instead?
   
   What I observed instead is that the getMessageById request gets blocked 
until the timeout occurs.
   
   The hidden risk is that, since the timeout duration is uncertain, if the 
user has not configured a timeout (e.g., PulsarAdmin.builder().readTimeout(5, 
TimeUnit.SECONDS);) or if the timeout configuration is unreasonable, it can 
cause the TCP connection to enter a CLOSE_WAIT state. In extreme cases, this 
could potentially lead to a tcp.listenOverflow, which can affect other 
functionalities.
   
   The following image shows a large number of connections in the CLOSE_WAIT 
state on the broker's 8080 port:
   
![image](https://github.com/user-attachments/assets/ac845f7e-e62f-4f77-9def-e54f303dda28)
   
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to