slfan1989 commented on PR #1267:
URL: https://github.com/apache/ratis/pull/1267#issuecomment-2896262500
@szetszwo @adoroszlai While upgrading the ratis-examples module, I
discovered an issue where the workerGroup might not be properly shut down,
causing unit tests to fail with the following exception.
```
2025-05-21 05:09:14,262 WARN util.LeakDetector
(LeakDetector.java:assertNoLeaks(175)) - 29/30) numLeaks == 1 > 0, will wait
and retry ...
2025-05-21 05:09:15,265 WARN util.ReferenceCountedLeakDetector
(ReferenceCountedLeakDetector.java:logLeakMessage(168)) - LEAK: (class
org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoopGroup, count=3,
value=org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoopGroup@756cf158)
java.lang.IllegalStateException: #leaks = 1 > 0, #leaks == set.size = 1
at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:77)
at
org.apache.ratis.util.LeakDetector$LeakTrackerSet.assertNoLeaks(LeakDetector.java:100)
at
org.apache.ratis.util.LeakDetector$LeakTrackerSet.getNumLeaks(LeakDetector.java:94)
at
org.apache.ratis.util.LeakDetector.assertNoLeaks(LeakDetector.java:178)
at
org.apache.ratis.server.impl.MiniRaftCluster.shutdown(MiniRaftCluster.java:892)
at
org.apache.ratis.grpc.MiniRaftClusterWithGrpc.shutdown(MiniRaftClusterWithGrpc.java:97)
at
org.apache.ratis.examples.filestore.FileStoreStreamingBaseTest.testFileStoreStreamSingleFile(FileStoreStreamingBaseTest.java:83)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
```
After debugging and analysis, I found that the issue was caused by the
`workerGroup` in `NettyClientStreamRpc` not releasing its resources.
https://github.com/apache/ratis/blob/0557974fa2d7409e9f5089a780a4b6024c53ec99/ratis-netty/src/main/java/org/apache/ratis/netty/client/NettyClientStreamRpc.java#L234-L242
This code seems reasonable to me—its asynchronous execution can improve
performance, but there is indeed a risk that resources may not be fully
released.
I added a small piece of code that, after the connection is closed, checks
whether the connection's `workerGroup` has been terminated. If it has not, I
call the `shutdownGracefully` method.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]