[ 
https://issues.apache.org/jira/browse/RATIS-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-840:
-----------------------------
    Description: 
*What's the problem ?*

When run hadoop-ozone for 4 days, datanode memory leak.  When dump heap, I 
found there are 460710 instances of GrpcLogAppender. But there are only 6 
instances of SenderList, and each SenderList contains 1-2 instance of 
GrpcLogAppender. And there are a lot of logs related to 
[LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428].{color:#DE350B}I
 will continue to find the root cause of why 
[LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428]
 so many times{color}
 {code:java}INFO impl.RaftServerImpl: 
1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-LeaderState: Restarting 
GrpcLogAppender for 
1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-\u003e229cbcc1-a3b2-4383-9c0d-c0f4c28c3d4a\n","stream":"stderr","time":"2020-04-06T03:59:53.37892512Z"}{code}
 

 So there are a lot of GrpcLogAppender did not stop the Daemon Thread when 
removed from senders.

 !image-2020-04-06-14-27-28-485.png! 

 !image-2020-04-06-14-27-39-582.png! 
 

*What's the reason ?*

>From the code, when 
>[removeSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L431]
> in 
>[LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428],
> it did not call 
>[LogAppender::stopAppender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LogAppender.java#L164].

 

*How to fix ?*

To avoid forgetting stopAppender,  I stopAppender in [SenderList 
::removeAll|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L173].
 

  was:
*What's the problem ?*

When run hadoop-ozone for 4 days, datanode memory leak.  When dump heap, I 
found there are 460710 instances of GrpcLogAppender. But there are only 6 
instances of SenderList, and each SenderList contains 1-2 instance of 
GrpcLogAppender. And there are a lot of logs related to 
[LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428].I
 will continue to find the root cause of why 
[LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428]
 so many times
 {code:java}INFO impl.RaftServerImpl: 
1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-LeaderState: Restarting 
GrpcLogAppender for 
1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-\u003e229cbcc1-a3b2-4383-9c0d-c0f4c28c3d4a\n","stream":"stderr","time":"2020-04-06T03:59:53.37892512Z"}{code}
 

 So there are a lot of GrpcLogAppender did not stop the Daemon Thread when 
removed from senders.

 !image-2020-04-06-14-27-28-485.png! 

 !image-2020-04-06-14-27-39-582.png! 
 

*What's the reason ?*

>From the code, when 
>[removeSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L431]
> in 
>[LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428],
> it did not call 
>[LogAppender::stopAppender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LogAppender.java#L164].

 

*How to fix ?*

To avoid forgetting stopAppender,  I stopAppender in [SenderList 
::removeAll|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L173].
 


> Memory leak of LogAppender
> --------------------------
>
>                 Key: RATIS-840
>                 URL: https://issues.apache.org/jira/browse/RATIS-840
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: runzhiwang
>            Priority: Major
>         Attachments: RATIS-840.001.patch, image-2020-04-06-14-27-28-485.png, 
> image-2020-04-06-14-27-39-582.png
>
>
> *What's the problem ?*
> When run hadoop-ozone for 4 days, datanode memory leak.  When dump heap, I 
> found there are 460710 instances of GrpcLogAppender. But there are only 6 
> instances of SenderList, and each SenderList contains 1-2 instance of 
> GrpcLogAppender. And there are a lot of logs related to 
> [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428].{color:#DE350B}I
>  will continue to find the root cause of why 
> [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428]
>  so many times{color}
>  {code:java}INFO impl.RaftServerImpl: 
> 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-LeaderState: 
> Restarting GrpcLogAppender for 
> 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-\u003e229cbcc1-a3b2-4383-9c0d-c0f4c28c3d4a\n","stream":"stderr","time":"2020-04-06T03:59:53.37892512Z"}{code}
>  
>  So there are a lot of GrpcLogAppender did not stop the Daemon Thread when 
> removed from senders.
>  !image-2020-04-06-14-27-28-485.png! 
>  !image-2020-04-06-14-27-39-582.png! 
>  
> *What's the reason ?*
> From the code, when 
> [removeSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L431]
>  in 
> [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428],
>  it did not call 
> [LogAppender::stopAppender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LogAppender.java#L164].
>  
> *How to fix ?*
> To avoid forgetting stopAppender,  I stopAppender in [SenderList 
> ::removeAll|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L173].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to