[
https://issues.apache.org/jira/browse/RATIS-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755793#comment-17755793
]
Ritesh Shukla commented on RATIS-878:
-------------------------------------
[~yjxxtd]reassigning this as this jira is important and there has been no
update for a few years.
> Infinite restart of LogAppender
> --------------------------------
>
> Key: RATIS-878
> URL: https://issues.apache.org/jira/browse/RATIS-878
> Project: Ratis
> Issue Type: Bug
> Reporter: runzhiwang
> Assignee: Duong
> Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png,
> screenshot-4.png
>
>
> *What's the problem ?*
> When run hadoop-ozone for 4 days, datanode memory leak. When dump heap, I
> found there are 460710 instances of GrpcLogAppender. But there are only 6
> instances of SenderList, and each SenderList contains 1-2 instance of
> GrpcLogAppender. And there are a lot of logs related to
> [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428].
> {code:java}INFO impl.RaftServerImpl:
> 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-LeaderState:
> Restarting GrpcLogAppender for
> 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-\u003e229cbcc1-a3b2-4383-9c0d-c0f4c28c3d4a\n","stream":"stderr","time":"2020-04-06T03:59:53.37892512Z"}{code}
>
> So there are a lot of GrpcLogAppender did not stop the Daemon Thread when
> removed from senders.
> !screenshot-2.png!
> !screenshot-3.png!
>
> *Why
> [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428]
> so many times ?*
> 1. As the image shows, when remove group, SegmentedRaftLog will close, then
> GrpcLogAppender throw exception when find the SegmentedRaftLog was closed.
> Then GrpcLogAppender will be
> [restarted|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LogAppender.java#L94],
> and the new GrpcLogAppender throw exception again when find the
> SegmentedRaftLog was closed, then GrpcLogAppender will be restarted again ...
> . It results in an infinite restart of GrpcLogAppender.
> 2. Actually, when remove group, GrpcLogAppender will be stoped:
> RaftServerImpl::shutdown ->
> [RoleInfo::shutdownLeaderState|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L266]
> -> LeaderState::stop -> LogAppender::stopAppender, then SegmentedRaftLog
> will be closed: RaftServerImpl::shutdown ->
> [ServerState:close|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L271]
> ... . Though RoleInfo::shutdownLeaderState called before ServerState:close,
> but the GrpcLogAppender was stopped asynchronously. So infinite restart of
> GrpcLogAppender happens, when GrpcLogAppender stop after SegmentedRaftLog
> close.
> !screenshot-4.png!
> More details please refer it here
> [RATIS-840|https://issues.apache.org/jira/browse/RATIS-840].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)