[I] The client I/O thread may be blocked [incubator-seata]

via GitHub Thu, 03 Jul 2025 23:53:10 -0700


funky-eyes opened a new issue, #7497:
URL: https://github.com/apache/incubator-seata/issues/7497


   ### Check Ahead
   
   - [x] I have searched the [issues](https://github.com/seata/seata/issues) of 
this repository and believe that this is not a duplicate.
   
   - [ ] I am willing to try to fix this bug myself.
   
   
   ### Ⅰ. Issue Description
   
   
![Image](https://github.com/user-attachments/assets/21601370-8b51-4d60-a537-e4ce7d687a8e)
   
   
![Image](https://github.com/user-attachments/assets/9792938b-24ab-430a-a974-f17e7f045520)
   
   
![Image](https://github.com/user-attachments/assets/9657ad42-abb8-4e8f-8c4f-7ba80a0eb6d6)
   在 NettyClientChannelManager.java 中，releaseChannel 方法和 acquireChannel 方法都会对 
channelLocks 对象进行加锁。exceptionCaught 方法（位于 AbstractNettyRemotingClient.java 的内部类 
ClientHandler 中）会调用 
releaseChannel。而exceptionCaught是出于netty的io线程中，如果io线程核心数只有1，则会导致请求和接受响应的线程一并阻塞，导致流量跌零。当然只有当服务端非优雅下线时，并且恰好reconnect任务与执行exceptionCaught方法同时进行才会导致这个现象，因为服务端节点宕机后已经无法连接了，该同步锁会等待建立连接请求超时才会释放该锁，导致这个期间客户端的请求全部hang住
   In NettyClientChannelManager.java, both the releaseChannel method and 
acquireChannel method acquire locks on the channelLocks object. The 
exceptionCaught method (located in the inner class ClientHandler of 
AbstractNettyRemotingClient.java) calls releaseChannel. Since exceptionCaught 
runs in Netty's I/O thread, if there's only 1 I/O thread core, it will cause 
both request and response handling threads to be blocked simultaneously, 
resulting in zero traffic throughput. This phenomenon only occurs when the 
server shuts down ungracefully and the reconnect task happens to run 
concurrently with the exceptionCaught method execution. Because the server node 
is down and no longer connectable, the synchronization lock will wait until the 
connection establishment request times out before releasing the lock, causing 
all client requests to hang during this period.
   
   另外channelLocks似乎永远不会进行清理，当服务端在k8s中部署，每次重启都更换一个新的ip时，久而久之客户端就会面临oom的风险。
   Additionally, channelLocks seems to never be cleaned up. When the server is 
deployed in Kubernetes and gets a new IP address with each restart, over time 
the client will face the risk of OOM (Out of Memory).
   
   ### Ⅱ. Describe what happened
   
   _No response_
   
   ### Ⅲ. Describe what you expected to happen
   
   _No response_
   
   ### Ⅳ. How to reproduce it (as minimally and precisely as possible)
   
   _No response_
   
   ### Ⅴ. Anything else we need to know?
   
   _No response_
   
   ### Ⅵ. Environment
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@seata.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@seata.apache.org
For additional commands, e-mail: notifications-h...@seata.apache.org

[I] The client I/O thread may be blocked [incubator-seata]

Reply via email to