[ https://issues.apache.org/jira/browse/RATIS-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
runzhiwang updated RATIS-840: ----------------------------- Description: *What's the problem ?* When run hadoop-ozone for 4 days, datanode memory leak. When dump heap, I found there are 460710 instances of GrpcLogAppender. But there are only 6 instances of SenderList, and each SenderList contains 1-2 instance of GrpcLogAppender. And there are a lot of logs related to [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428].{color:#DE350B}I will continue to find the root cause of why [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428] so many times{color} {code:java}INFO impl.RaftServerImpl: 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-LeaderState: Restarting GrpcLogAppender for 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-\u003e229cbcc1-a3b2-4383-9c0d-c0f4c28c3d4a\n","stream":"stderr","time":"2020-04-06T03:59:53.37892512Z"}{code} So there are a lot of GrpcLogAppender did not stop the Daemon Thread when removed from senders. !image-2020-04-06-14-27-28-485.png! !image-2020-04-06-14-27-39-582.png! *What's the reason ?* >From the code, when >[removeSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L431] > in >[LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428], > it did not call >[LogAppender::stopAppender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LogAppender.java#L164]. *How to fix ?* To avoid forgetting stopAppender, I stopAppender in [SenderList ::removeAll|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L173]. was: *What's the problem ?* When run hadoop-ozone for 4 days, datanode memory leak. When dump heap, I found there are 460710 instances of GrpcLogAppender. But there are only 6 instances of SenderList, and each SenderList contains 1-2 instance of GrpcLogAppender. And there are a lot of logs related to [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428].I will continue to find the root cause of why [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428] so many times {code:java}INFO impl.RaftServerImpl: 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-LeaderState: Restarting GrpcLogAppender for 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-\u003e229cbcc1-a3b2-4383-9c0d-c0f4c28c3d4a\n","stream":"stderr","time":"2020-04-06T03:59:53.37892512Z"}{code} So there are a lot of GrpcLogAppender did not stop the Daemon Thread when removed from senders. !image-2020-04-06-14-27-28-485.png! !image-2020-04-06-14-27-39-582.png! *What's the reason ?* >From the code, when >[removeSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L431] > in >[LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428], > it did not call >[LogAppender::stopAppender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LogAppender.java#L164]. *How to fix ?* To avoid forgetting stopAppender, I stopAppender in [SenderList ::removeAll|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L173]. > Memory leak of LogAppender > -------------------------- > > Key: RATIS-840 > URL: https://issues.apache.org/jira/browse/RATIS-840 > Project: Ratis > Issue Type: Bug > Reporter: runzhiwang > Priority: Major > Attachments: RATIS-840.001.patch, image-2020-04-06-14-27-28-485.png, > image-2020-04-06-14-27-39-582.png > > > *What's the problem ?* > When run hadoop-ozone for 4 days, datanode memory leak. When dump heap, I > found there are 460710 instances of GrpcLogAppender. But there are only 6 > instances of SenderList, and each SenderList contains 1-2 instance of > GrpcLogAppender. And there are a lot of logs related to > [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428].{color:#DE350B}I > will continue to find the root cause of why > [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428] > so many times{color} > {code:java}INFO impl.RaftServerImpl: > 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-LeaderState: > Restarting GrpcLogAppender for > 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-\u003e229cbcc1-a3b2-4383-9c0d-c0f4c28c3d4a\n","stream":"stderr","time":"2020-04-06T03:59:53.37892512Z"}{code} > > So there are a lot of GrpcLogAppender did not stop the Daemon Thread when > removed from senders. > !image-2020-04-06-14-27-28-485.png! > !image-2020-04-06-14-27-39-582.png! > > *What's the reason ?* > From the code, when > [removeSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L431] > in > [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428], > it did not call > [LogAppender::stopAppender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LogAppender.java#L164]. > > *How to fix ?* > To avoid forgetting stopAppender, I stopAppender in [SenderList > ::removeAll|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L173]. > -- This message was sent by Atlassian Jira (v8.3.4#803005)