danpi opened a new issue, #23765: URL: https://github.com/apache/pulsar/issues/23765
### Search before asking - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar. ### Read release policy - [X] I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker. ### Version OS:centos7 Jdk:17 Pulsar version:3.0.7 ### Minimal reproduce step 1. Add a test case to the testGetMessageById method in PersistentTopicsTest.java. 2. Specifically, you can add the following code: `Assert.expectThrows(PulsarAdminException.ServerSideErrorException.class, () -> { admin.topics().getMessageById(topicName1, id1.getLedgerId(), id1.getEntryId() + 10); }); ` 3. Run this test case to reproduce the issue. You will encounter the following error: `Caused by: org.apache.pulsar.client.admin.PulsarAdminException$TimeoutException: java.util.concurrent.TimeoutException at org.apache.pulsar.client.admin.internal.BaseResource.sync(BaseResource.java:347) at org.apache.pulsar.client.admin.internal.TopicsImpl.getMessageById(TopicsImpl.java:1010) at org.apache.pulsar.broker.admin.PersistentTopicsTest.lambda$testGetMessageById$11(PersistentTopicsTest.java:1385) at org.testng.Assert.expectThrows(Assert.java:2440) ... 29 more ` ### What did you expect to see? The issue occurs when trying to query a non-existent message, which usually happens when a topic is newly created but hasn't received any traffic yet. In such cases, querying some information about the topic might invoke this API, leading to a timeout. For this scenario, I would expect a fast failure, rather than being blocked until the timeout occurs. ### What did you see instead? What I observed instead is that the getMessageById request gets blocked until the timeout occurs. The hidden risk is that, since the timeout duration is uncertain, if the user has not configured a timeout (e.g., PulsarAdmin.builder().readTimeout(5, TimeUnit.SECONDS);) or if the timeout configuration is unreasonable, it can cause the TCP connection to enter a CLOSE_WAIT state. In extreme cases, this could potentially lead to a tcp.listenOverflow, which can affect other functionalities. The following image shows a large number of connections in the CLOSE_WAIT state on the broker's 8080 port:  ### Anything else? _No response_ ### Are you willing to submit a PR? - [X] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
