absolute8511 opened a new issue, #7056:
URL: https://github.com/apache/rocketmq/issues/7056

   ### Before Creating the Bug Report
   
   - [X] I found a bug, not just asking a question, which should be created in 
[GitHub Discussions](https://github.com/apache/rocketmq/discussions).
   
   - [X] I have searched the [GitHub 
Issues](https://github.com/apache/rocketmq/issues) and [GitHub 
Discussions](https://github.com/apache/rocketmq/discussions)  of this 
repository and believe that this is not a duplicate.
   
   - [X] I have confirmed that this bug belongs to the current repository, not 
other repositories of RocketMQ.
   
   
   ### Runtime platform environment
   
   Linux
   
   ### RocketMQ version
   
   4.9.x
   
   ### JDK Version
   
   _No response_
   
   ### Describe the Bug
   
   The client called `invokeSync` with timeout 3000ms, it will fail forever 
when there are 2 nameservers with the first nameserver unreachable. 
   
   ### Steps to Reproduce
   
   
   In the `invokeSync` method, when there are 2 nameservers, if the first 
nameserver failed to connect(which will timeout after 3000ms), 
`getAndCreateChannel` will always cost more than 3000ms after the second 
nameserver success. 
    
https://github.com/apache/rocketmq/blob/804f2d85f22d9ee52573b9c6ee6abae248c9b387/remoting/src/main/java/org/apache/rocketmq/remoting/netty/NettyRemotingClient.java#L531
   
   RemotingTimeoutException will be throwed, and the second success channel 
will be closed.  Then next `invokeSync` will choose the first in the 
`getAndCreateChannel` and will fail again, and forever failed in next 
`invokeSync`.
   
   for example the logs below
   
   ```
   2023-07-19 17:03:47 WARN MQClientFactoryScheduledThread11%2381103787518 - 
createChannel: connect remote host[xxx-nameserver-0.rocketmq.svc.xxx:9876] 
timeout 3000ms, 
AbstractBootstrap$PendingRegistrationPromise@2bf472e1(uncancellable) 
   2023-07-19 17:03:47 INFO MQClientFactoryScheduledThread11%2381103787518 - 
new name server is chosen. OLD: xxx-nameserver-1.rocketmq.svc.xxx:9876 , NEW: 
xxx-nameserver-1.rocketmq.svc.xxx:9876. namesrvIndex = 87 
   2023-07-19 17:03:47 INFO MQClientFactoryScheduledThread11%2381103787518 - 
createChannel: begin to connect remote 
host[xxx-nameserver-1.rocketmq.svc.xxx:9876] asynchronously 
   2023-07-19 17:03:47 INFO NettyClientWorkerThread_1 - NETTY CLIENT PIPELINE: 
CLOSE  
   2023-07-19 17:03:47 INFO NettyClientWorkerThread_1 - closeChannel: the 
channel[xxx-nameserver-0.rocketmq.svc.xxx:9876] was removed from channel table 
   2023-07-19 17:03:47 INFO NettyClientWorkerThread_1 - NETTY CLIENT PIPELINE: 
CLOSE  
   2023-07-19 17:03:47 INFO NettyClientWorkerThread_1 - eventCloseChannel: the 
channel[null] has been removed from the channel table before 
   2023-07-19 17:03:47 INFO NettyClientWorkerThread_2 - NETTY CLIENT PIPELINE: 
CONNECT  UNKNOWN => xxx-nameserver-1.rocketmq.svc.xxx.org/172.20.x.x:9876 
   2023-07-19 17:03:47 INFO 11%2381103787518_NettyClientSelector_1 - 
closeChannel: close the connection to remote address[] result: true 
   2023-07-19 17:03:47 INFO MQClientFactoryScheduledThread11%2381103787518 - 
createChannel: connect remote host[xxx-nameserver-1.rocketmq.svc.xxx:9876] 
success, AbstractBootstrap$PendingRegistrationPromise@a3c936(success) 
   2023-07-19 17:03:47 INFO MQClientFactoryScheduledThread11%2381103787518 - 
closeChannel: begin close the channel[172.20.x.x:9876] Found: false 
   2023-07-19 17:03:47 INFO MQClientFactoryScheduledThread11%2381103787518 - 
closeChannel: the channel[172.20.x.x:9876] has been removed from the channel 
table before 
   2023-07-19 17:03:47 WARN MQClientFactoryScheduledThread11%2381103787518 - 
invokeSync: close socket because of timeout, 3000ms, null 
   2023-07-19 17:03:47 WARN MQClientFactoryScheduledThread11%2381103787518 - 
invokeSync: wait response timeout exception, the channel[null] 
   2023-07-19 17:03:47 INFO NettyClientWorkerThread_2 - NETTY CLIENT PIPELINE: 
CLOSE 172.20.x.x:9876 
   2023-07-19 17:03:47 INFO NettyClientWorkerThread_2 - closeChannel: the 
channel[xxx-1.rocketmq.svc.xxx:9876] was removed from channel table 
   2023-07-19 17:03:47 INFO NettyClientWorkerThread_2 - NETTY CLIENT PIPELINE: 
CLOSE 172.20.x.x:9876 
   2023-07-19 17:03:47 INFO NettyClientWorkerThread_2 - eventCloseChannel: the 
channel[null] has been removed from the channel table before 
   2023-07-19 17:03:47 INFO 11%2381103787518_NettyClientSelector_1 - 
closeChannel: close the connection to remote address[172.20.x.x:9876] result: 
true 
   ```
   
   ### What Did You Expect to See?
   
   invokeSync should success in the next call since the second nameserver is ok
   
   ### What Did You See Instead?
   
   invokeSync failed for a long time
   
   ### Additional Context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to