[GitHub] [rocketmq] imzs opened a new issue, #5866: Client-side memory leak of inactive channel

via GitHub Tue, 30 May 2023 23:27:40 -0700


imzs opened a new issue, #5866:
URL: https://github.com/apache/rocketmq/issues/5866

### [Abstract]
Although server and client both have idle connection management, we found
client-side memory leak of inacvite channels associated with port 10909, which,
actually, is hard to described as a corner case.
### [Detail]
Here is a case in point.

![image](https://user-images.githubusercontent.com/7539566/211751922-c603e60f-18fd-46f8-a5cd-d0db27a491fb.png)
A high occupation of heap memory by NettyRemotingClient is shown above.
Normally, a singleton instance should not be like this. Analyse the heap dump
with MAT, it's easy to find a memory leak of channelTables.

![image](https://user-images.githubusercontent.com/7539566/211752052-b071820d-658b-4ec3-8de0-838828705dde.png)
Too many closed channels are still alive in channel table preventing garbage
collection of themselves and other relative objects such as instances of
ChannelWrapper, SelectionKeyImpl.

Use OQL,
`select * from sun.nio.ch.SocketChannelImpl t where t.state = 4 // ST_KILLED`
`select * from sun.nio.ch.SocketChannelImpl t where t.state = 4 and
toString(t.remoteAddress).endsWith(":10909")`

The entries count are almost the same, and is much larger than the num of
acvite channels (ST_CONNECTED). This indicates that the application creates
many socket connections and never clean the resources. Server port-10909 is
used as fast remoting port, especially, of those 10909 tcp sockets, there are
one active socket connection and many inactive connections with the same remote
broker address.

Is the channel reuse not working?
No, the anwser is obvious, the created channel is somehow closed and the
client has to recreate one and thus results in a memory leak. This situation
only happens when a channel IdleStateEvent triggered in broker and it closes
the channel directly, for example, a producer connects to the broker but sends
message not frequently, the broker will close the channel which has not
performed read, write operation for 120 seconds by default.

So the reproduce procedure is simple, start a local producer and send
messages every 125 seconds, FGC occurs eventually, that's why we say it's not a
corner case but can only cause serious result after a long period of running.

### [Bug Fix]
The root reason is that inactive event is an inbound event, the channel of
the ChannelHandlerContext was registered is now inactive and reached its end of
lifetime, channelInactive(ChannelHandlerContext ctx) is invoked when a channel
leaves active state and is no longer connected to its remote peer.
So the solution is to implement the channelInactive() method and close the
channel on the client side.

### [Furthermore]
What about the client channel connections with name server?
Not the same, client instance will update topic route every 30 seconds even
if no messages sent at all.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [rocketmq] imzs opened a new issue, #5866: Client-side memory leak of inactive channel

Reply via email to