[
https://issues.apache.org/jira/browse/RATIS-840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095000#comment-17095000
]
Hadoop QA commented on RATIS-840:
---------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m
45s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m
0s{color} | {color:green} The patch appears to include 1 new or modified test
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
57s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m
7s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 28s{color}
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
19s{color} | {color:green} The patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m 9s{color} |
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | ratis.logservice.TestLogServiceWithNetty |
| | ratis.logservice.server.TestMetaServer |
| | ratis.netty.TestRaftExceptionWithNetty |
| | ratis.netty.TestLeaderElectionWithNetty |
| | ratis.netty.TestGroupManagementWithNetty |
| | ratis.grpc.TestRaftWithGrpc |
| | ratis.grpc.TestRaftServerWithGrpc |
| | ratis.netty.TestRaftReconfigurationWithNetty |
| | ratis.server.simulation.TestRaftStateMachineExceptionWithSimulatedRpc |
| | ratis.netty.TestRaftSnapshotWithNetty |
| | ratis.grpc.TestRaftAsyncWithGrpc |
| | ratis.grpc.TestRaftSnapshotWithGrpc |
| | ratis.server.simulation.TestGroupManagementWithSimulatedRpc |
| | ratis.server.simulation.TestRaftSnapshotWithSimulatedRpc |
| | ratis.netty.TestRaftWithNetty |
| | ratis.netty.TestGroupInfoWithNetty |
| | ratis.examples.filestore.TestFileStoreWithGrpc |
| | ratis.examples.filestore.TestFileStoreWithNetty |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/ratis:date2020-04-29 |
| JIRA Issue | RATIS-840 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/13001528/RATIS-840.004.patch |
| Optional Tests | dupname asflicense javac javadoc unit findbugs
checkstyle compile |
| uname | Linux f3c345ee734b 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality |
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh
|
| git revision | master / cac3336 |
| maven | version: Apache Maven 3.6.3
(cecedd343002696d0abb50b32b541b8a6ba2883f) |
| Default Java | 1.8.0_252 |
| unit |
https://builds.apache.org/job/PreCommit-RATIS-Build/1309/artifact/out/patch-unit-root.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-RATIS-Build/1309/testReport/ |
| Max. process+thread count | 1230 (vs. ulimit of 5000) |
| modules | C: ratis-server ratis-test U: . |
| Console output |
https://builds.apache.org/job/PreCommit-RATIS-Build/1309/console |
| Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
> Memory leak of LogAppender
> --------------------------
>
> Key: RATIS-840
> URL: https://issues.apache.org/jira/browse/RATIS-840
> Project: Ratis
> Issue Type: Bug
> Components: server
> Reporter: runzhiwang
> Assignee: runzhiwang
> Priority: Blocker
> Attachments: RATIS-840.001.patch, RATIS-840.002.patch,
> RATIS-840.003.patch, RATIS-840.004.patch, image-2020-04-06-14-27-28-485.png,
> image-2020-04-06-14-27-39-582.png, screenshot-1.png, screenshot-2.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> *What's the problem ?*
> When run hadoop-ozone for 4 days, datanode memory leak. When dump heap, I
> found there are 460710 instances of GrpcLogAppender. But there are only 6
> instances of SenderList, and each SenderList contains 1-2 instance of
> GrpcLogAppender. And there are a lot of logs related to
> [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428].
> {code:java}INFO impl.RaftServerImpl:
> 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-LeaderState:
> Restarting GrpcLogAppender for
> 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-\u003e229cbcc1-a3b2-4383-9c0d-c0f4c28c3d4a\n","stream":"stderr","time":"2020-04-06T03:59:53.37892512Z"}{code}
>
> So there are a lot of GrpcLogAppender did not stop the Daemon Thread when
> removed from senders.
> !image-2020-04-06-14-27-28-485.png!
> !image-2020-04-06-14-27-39-582.png!
>
> *Why
> [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428]
> so many times ?*
> 1. As the image shows, when remove group, SegmentedRaftLog will close, then
> GrpcLogAppender throw exception when find the SegmentedRaftLog was closed.
> Then GrpcLogAppender will be
> [restarted|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LogAppender.java#L94],
> and the new GrpcLogAppender throw exception again when find the
> SegmentedRaftLog was closed, then GrpcLogAppender will be restarted again ...
> . It results in an infinite restart of GrpcLogAppender.
> 2. Actually, when remove group, GrpcLogAppender will be stoped:
> RaftServerImpl::shutdown ->
> [RoleInfo::shutdownLeaderState|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L266]
> -> LeaderState::stop -> LogAppender::stopAppender, then SegmentedRaftLog
> will be closed: RaftServerImpl::shutdown ->
> [ServerState:close|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L271]
> ... . Though RoleInfo::shutdownLeaderState called before ServerState:close,
> but the GrpcLogAppender was stopped asynchronously. So infinite restart of
> GrpcLogAppender happens, when GrpcLogAppender stop after SegmentedRaftLog
> close.
> !screenshot-1.png!
> *Why GrpcLogAppender did not stop the Daemon Thread when removed from senders
> ?*
> I find a lot of GrpcLogAppender blocked inside logs4j. I think it's
> GrpcLogAppender restart too fast, then blocked in logs4j.
> !screenshot-2.png!
> *Can the new GrpcLogAppender work normally ?*
> 1. Even though without the above problem, the new created GrpcLogAppender
> still can not work normally.
> 2. When creat a new GrpcLogAppender, a new FollowerInfo will also be created:
> LeaderState::addAndStartSenders ->
> LeaderState::addSenders->RaftServerImpl::newLogAppender -> [new
> FollowerInfo|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L129]
> 3. When the new created GrpcLogAppender append entry to follower, then the
> follower response SUCCESS.
> 4. Then LeaderState::updateCommit -> [LeaderState::getMajorityMin |
> https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L599]
> ->
> [voterLists.get(0) |
> https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L607].
> {color:#DE350B}Error happens because voterLists.get(0) return the
> FollowerInfo of the old GrpcLogAppender, not the FollowerInfo of the new
> GrpcLogAppender. {color}
> 5. Because the majority commit got from the FollowerInfo of the old
> GrpcLogAppender never changes. So even though follower has append entry
> successfully, the leader can not update commit. So the new created
> GrpcLogAppender can never work normally.
> 6. The reason of unit test of runTestRestartLogAppender can pass is that it
> did not stop the old GrpcLogAppender, and the old GrpcLogAppender append
> entry to follower, not the new GrpcLogAppender. If stop the old
> GrpcLogAppender, runTestRestartLogAppender will fail.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)