[ 
https://issues.apache.org/jira/browse/RATIS-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1074:
-----------------------------
    Description: 
GrpcLogAppender improperly decrease nextIndex to 1, which will trigger 
installSnapshot request to followers, and make pipeline be vulnerable.

 

*Below is the interaction between leader and the follower.*

Follower side, received an install snapshot request, and closed the pipeline.
{code:java}
ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,719 
[grpc-default-executor-628] INFO org.apache.ratis.server.impl.RaftServerImpl: 
591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8: receive 
installSnapshot: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb->591f38ce-0b11-4003-8528-7a5802f89ea8#0-t514,notify:(t:392,
 i:992920)
 ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,719 
[grpc-default-executor-628] INFO 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
 Leader change notification received for group: group-622E07D885D8 with new 
leaderId: 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb
 ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,719 
[grpc-default-executor-628] INFO org.apache.ratis.server.impl.RaftServerImpl: 
591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8: change Leader from 
null to 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb at term 514 for installSnapshot, 
leader elected after 50593ms
 ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,719 
[grpc-default-executor-628] INFO org.apache.ratis.server.impl.RaftServerImpl: 
591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8: notifyInstallSnapshot: 
nextIndex is 2248004 but the leader's first available index is 992920.
 ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,719 
[grpc-default-executor-628] WARN 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
 Install snapshot notification received from Leader with termIndex: (t:392, 
i:992920), terminating pipeline: group-622E07D885D8
 ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,720 
[grpc-default-executor-628] ERROR 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
 pipeline Action CLOSE on pipeline 
PipelineID=433c67af-f834-4959-bcf7-622e07d885d8.Reason : 
591f38ce-0b11-4003-8528-7a5802f89ea8 closes pipeline when installSnapshot from 
leader because leader snapshot doesn't contain any data to replay, all the log 
entries prior to the snapshot might have been purged.So follower should not try 
to install snapshot from leader butcan close the pipeline here. It's in 
follower state for 1761ms
 ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,720 
[grpc-default-executor-628] INFO org.apache.ratis.server.impl.RaftServerImpl: 
591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8: StateMachine 
successfully installed snapshot index 992920. Reloading the StateMachine.
 ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,720 
[grpc-default-executor-628] INFO 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8-SegmentedRaftLogWorker: 
flushIndex: setUnconditionally 2248003 -> 992920
 ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,720 
[grpc-default-executor-628] INFO 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8-SegmentedRaftLogWorker: 
safeCacheEvictIndex: setUnconditionally 2248001 -> 992920
 ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,720 
[grpc-default-executor-628] INFO org.apache.ratis.server.impl.RaftServerImpl: 
591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8: reply installSnapshot: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb<-591f38ce-0b11-4003-8528-7a5802f89ea8#0:FAIL-t514,IN_PROGRESS{code}

  Leader side, just take charge, got vote from another follower, but miss vote 
from this follower
{code:java}
ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:21,499 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
INFO org.apache.ratis.server.impl.LeaderElection: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143: 
begin an election at term 510 for 2248004: 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb:172.16.90.54:9858:0, 
591f38ce-0b11-4003-8528-7a5802f89ea8:172.16.90.69:9858:1, 
e5596d61-c021-4a4b-89bc-6d8e6d4f1baf:172.16.90.56:9858:0], old=null
 ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,580 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
INFO org.apache.ratis.server.impl.LeaderElection: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143: 
Election PASSED; received 1 response(s) 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb<-e5596d61-c021-4a4b-89bc-6d8e6d4f1baf#0:OK-t510]
 and 0 exception(s); 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8:t510, leader=null, 
voted=6769b0ef-fb4a-4235-bcf0-f1623fcd74cb, 
raftlog=6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-SegmentedRaftLog:OPENED:c2248003,f2248004,i2248004,
 conf=2248004: [6769b0ef-fb4a-4235-bcf0-f1623fcd74cb:172.16.90.54:9858:0, 
591f38ce-0b11-4003-8528-7a5802f89ea8:172.16.90.69:9858:1, 
e5596d61-c021-4a4b-89bc-6d8e6d4f1baf:172.16.90.56:9858:0], old=null
 ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,580 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
INFO org.apache.ratis.server.impl.RoleInfo: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb: shutdown LeaderElection
 ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,580 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
INFO org.apache.ratis.server.impl.RaftServerImpl: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8: changes role from 
CANDIDATE to LEADER at term 510 for changeToLeader
 ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,580 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
INFO 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
 Leader change notification received for group: group-622E07D885D8 with new 
leaderId: 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb
 ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,588 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
INFO org.apache.ratis.server.impl.RaftServerImpl: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8: change Leader from 
null to 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb at term 510 for becomeLeader, 
leader elected after 20419ms
 ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,588 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.staging.catchup.gap = 1000 (default)
 ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,588 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
INFO org.apache.ratis.server.RaftServerConfigKeys: raft.server.rpc.sleep.time = 
25ms (default){code}
Due to Grpc exception, leader decrease next index of follower to 1. Since 
leader has purged log, it has no choice but send out an installSnapshot request 
to follower.
ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:51,531 
[grpc-default-executor-345] WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8->591f38ce-0b11-4003-8528-7a5802f89ea8-AppendLogResponseHandler:
 Failed appendEntries: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:51,531 
[grpc-default-executor-345] INFO org.apache.ratis.server.impl.FollowerInfo: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8->591f38ce-0b11-4003-8528-7a5802f89ea8:
 nextIndex: updateUnconditionally 2248006 -> 1
ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:51,672 
[org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$601/1151550579@71fbee17]
 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8->591f38ce-0b11-4003-8528-7a5802f89ea8-GrpcLogAppender:
 followerNextIndex = 1 but logStartIndex = 992920, notify follower to install 
snapshot-(t:392, i:992920)
 

  was:
GrpcLogAppender improperly decrease nextIndex to 1, which will trigger 
installSnapshot request to followers, and make pipeline be vulnerable.

 

*Below is the interaction between leader and the follower.*

Follower side, received an install snapshot request, and closed the pipeline.
ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,719 
[grpc-default-executor-628] INFO org.apache.ratis.server.impl.RaftServerImpl: 
591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8: receive 
installSnapshot: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb->591f38ce-0b11-4003-8528-7a5802f89ea8#0-t514,notify:(t:392,
 i:992920)
ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,719 
[grpc-default-executor-628] INFO 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
 Leader change notification received for group: group-622E07D885D8 with new 
leaderId: 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb
ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,719 
[grpc-default-executor-628] INFO org.apache.ratis.server.impl.RaftServerImpl: 
591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8: change Leader from 
null to 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb at term 514 for installSnapshot, 
leader elected after 50593ms
ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,719 
[grpc-default-executor-628] INFO org.apache.ratis.server.impl.RaftServerImpl: 
591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8: notifyInstallSnapshot: 
nextIndex is 2248004 but the leader's first available index is 992920.
ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,719 
[grpc-default-executor-628] WARN 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
 Install snapshot notification received from Leader with termIndex: (t:392, 
i:992920), terminating pipeline: group-622E07D885D8
ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,720 
[grpc-default-executor-628] ERROR 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
 pipeline Action CLOSE on pipeline 
PipelineID=433c67af-f834-4959-bcf7-622e07d885d8.Reason : 
591f38ce-0b11-4003-8528-7a5802f89ea8 closes pipeline when installSnapshot from 
leader because leader snapshot doesn't contain any data to replay, all the log 
entries prior to the snapshot might have been purged.So follower should not try 
to install snapshot from leader butcan close the pipeline here. It's in 
follower state for 1761ms
ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,720 
[grpc-default-executor-628] INFO org.apache.ratis.server.impl.RaftServerImpl: 
591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8: StateMachine 
successfully installed snapshot index 992920. Reloading the StateMachine.
ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,720 
[grpc-default-executor-628] INFO 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8-SegmentedRaftLogWorker: 
flushIndex: setUnconditionally 2248003 -> 992920
ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,720 
[grpc-default-executor-628] INFO 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8-SegmentedRaftLogWorker: 
safeCacheEvictIndex: setUnconditionally 2248001 -> 992920
ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,720 
[grpc-default-executor-628] INFO org.apache.ratis.server.impl.RaftServerImpl: 
591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8: reply installSnapshot: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb<-591f38ce-0b11-4003-8528-7a5802f89ea8#0:FAIL-t514,IN_PROGRESS
 

Leader side, just take charge, got vote from another follower, but miss vote 
from this follower
ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:21,499 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
INFO org.apache.ratis.server.impl.LeaderElection: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143: 
begin an election at term 510 for 2248004: 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb:172.16.90.54:9858:0, 
591f38ce-0b11-4003-8528-7a5802f89ea8:172.16.90.69:9858:1, 
e5596d61-c021-4a4b-89bc-6d8e6d4f1baf:172.16.90.56:9858:0], old=null
ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,580 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
INFO org.apache.ratis.server.impl.LeaderElection: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143: 
Election PASSED; received 1 response(s) 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb<-e5596d61-c021-4a4b-89bc-6d8e6d4f1baf#0:OK-t510]
 and 0 exception(s); 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8:t510, leader=null, 
voted=6769b0ef-fb4a-4235-bcf0-f1623fcd74cb, 
raftlog=6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-SegmentedRaftLog:OPENED:c2248003,f2248004,i2248004,
 conf=2248004: [6769b0ef-fb4a-4235-bcf0-f1623fcd74cb:172.16.90.54:9858:0, 
591f38ce-0b11-4003-8528-7a5802f89ea8:172.16.90.69:9858:1, 
e5596d61-c021-4a4b-89bc-6d8e6d4f1baf:172.16.90.56:9858:0], old=null
ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,580 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
INFO org.apache.ratis.server.impl.RoleInfo: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb: shutdown LeaderElection
ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,580 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
INFO org.apache.ratis.server.impl.RaftServerImpl: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8: changes role from 
CANDIDATE to LEADER at term 510 for changeToLeader
ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,580 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
INFO 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
 Leader change notification received for group: group-622E07D885D8 with new 
leaderId: 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb
ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,588 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
INFO org.apache.ratis.server.impl.RaftServerImpl: 
6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8: change Leader from 
null to 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb at term 510 for becomeLeader, 
leader elected after 20419ms
ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,588 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.staging.catchup.gap = 1000 (default)
ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,588 
[6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
INFO org.apache.ratis.server.RaftServerConfigKeys: raft.server.rpc.sleep.time = 
25ms (default)
 

Due to Grpc exception, leader decrease next index of follower to 1. Since 
leader has purged log, it has no choice but send out an install snapsho

 


>  GrpcLogAppender improperly decrease nextIndex to 1, which will trigger 
> installSnapshot request to followers.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: RATIS-1074
>                 URL: https://issues.apache.org/jira/browse/RATIS-1074
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: Glen Geng
>            Assignee: Glen Geng
>            Priority: Major
>              Labels: MultiRaft, dense-storage
>             Fix For: 1.1.0
>
>
> GrpcLogAppender improperly decrease nextIndex to 1, which will trigger 
> installSnapshot request to followers, and make pipeline be vulnerable.
>  
> *Below is the interaction between leader and the follower.*
> Follower side, received an install snapshot request, and closed the pipeline.
> {code:java}
> ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,719 
> [grpc-default-executor-628] INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8: receive 
> installSnapshot: 
> 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb->591f38ce-0b11-4003-8528-7a5802f89ea8#0-t514,notify:(t:392,
>  i:992920)
>  ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,719 
> [grpc-default-executor-628] INFO 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  Leader change notification received for group: group-622E07D885D8 with new 
> leaderId: 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb
>  ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,719 
> [grpc-default-executor-628] INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8: change Leader from 
> null to 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb at term 514 for installSnapshot, 
> leader elected after 50593ms
>  ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,719 
> [grpc-default-executor-628] INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8: 
> notifyInstallSnapshot: nextIndex is 2248004 but the leader's first available 
> index is 992920.
>  ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,719 
> [grpc-default-executor-628] WARN 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  Install snapshot notification received from Leader with termIndex: (t:392, 
> i:992920), terminating pipeline: group-622E07D885D8
>  ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,720 
> [grpc-default-executor-628] ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE on pipeline 
> PipelineID=433c67af-f834-4959-bcf7-622e07d885d8.Reason : 
> 591f38ce-0b11-4003-8528-7a5802f89ea8 closes pipeline when installSnapshot 
> from leader because leader snapshot doesn't contain any data to replay, all 
> the log entries prior to the snapshot might have been purged.So follower 
> should not try to install snapshot from leader butcan close the pipeline 
> here. It's in follower state for 1761ms
>  ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,720 
> [grpc-default-executor-628] INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8: StateMachine 
> successfully installed snapshot index 992920. Reloading the StateMachine.
>  ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,720 
> [grpc-default-executor-628] INFO 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
> 591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8-SegmentedRaftLogWorker:
>  flushIndex: setUnconditionally 2248003 -> 992920
>  ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,720 
> [grpc-default-executor-628] INFO 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
> 591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8-SegmentedRaftLogWorker:
>  safeCacheEvictIndex: setUnconditionally 2248001 -> 992920
>  ozone-root-datanode-172.16.90.69.log.9:2020-09-21 19:28:49,720 
> [grpc-default-executor-628] INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 591f38ce-0b11-4003-8528-7a5802f89ea8@group-622E07D885D8: reply 
> installSnapshot: 
> 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb<-591f38ce-0b11-4003-8528-7a5802f89ea8#0:FAIL-t514,IN_PROGRESS{code}
>   Leader side, just take charge, got vote from another follower, but miss 
> vote from this follower
> {code:java}
> ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:21,499 
> [6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
> INFO org.apache.ratis.server.impl.LeaderElection: 
> 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143: 
> begin an election at term 510 for 2248004: 
> [6769b0ef-fb4a-4235-bcf0-f1623fcd74cb:172.16.90.54:9858:0, 
> 591f38ce-0b11-4003-8528-7a5802f89ea8:172.16.90.69:9858:1, 
> e5596d61-c021-4a4b-89bc-6d8e6d4f1baf:172.16.90.56:9858:0], old=null
>  ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,580 
> [6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
> INFO org.apache.ratis.server.impl.LeaderElection: 
> 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143: 
> Election PASSED; received 1 response(s) 
> [6769b0ef-fb4a-4235-bcf0-f1623fcd74cb<-e5596d61-c021-4a4b-89bc-6d8e6d4f1baf#0:OK-t510]
>  and 0 exception(s); 
> 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8:t510, leader=null, 
> voted=6769b0ef-fb4a-4235-bcf0-f1623fcd74cb, 
> raftlog=6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-SegmentedRaftLog:OPENED:c2248003,f2248004,i2248004,
>  conf=2248004: [6769b0ef-fb4a-4235-bcf0-f1623fcd74cb:172.16.90.54:9858:0, 
> 591f38ce-0b11-4003-8528-7a5802f89ea8:172.16.90.69:9858:1, 
> e5596d61-c021-4a4b-89bc-6d8e6d4f1baf:172.16.90.56:9858:0], old=null
>  ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,580 
> [6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
> INFO org.apache.ratis.server.impl.RoleInfo: 
> 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb: shutdown LeaderElection
>  ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,580 
> [6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
> INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8: changes role from 
> CANDIDATE to LEADER at term 510 for changeToLeader
>  ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,580 
> [6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
> INFO 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  Leader change notification received for group: group-622E07D885D8 with new 
> leaderId: 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb
>  ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,588 
> [6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
> INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8: change Leader from 
> null to 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb at term 510 for becomeLeader, 
> leader elected after 20419ms
>  ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,588 
> [6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
> INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.staging.catchup.gap = 1000 (default)
>  ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:26,588 
> [6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8-LeaderElection143] 
> INFO org.apache.ratis.server.RaftServerConfigKeys: raft.server.rpc.sleep.time 
> = 25ms (default){code}
> Due to Grpc exception, leader decrease next index of follower to 1. Since 
> leader has purged log, it has no choice but send out an installSnapshot 
> request to follower.
> ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:51,531 
> [grpc-default-executor-345] WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8->591f38ce-0b11-4003-8528-7a5802f89ea8-AppendLogResponseHandler:
>  Failed appendEntries: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:51,531 
> [grpc-default-executor-345] INFO org.apache.ratis.server.impl.FollowerInfo: 
> 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8->591f38ce-0b11-4003-8528-7a5802f89ea8:
>  nextIndex: updateUnconditionally 2248006 -> 1
> ozone-root-datanode-172.16.90.54.log.3:2020-09-21 19:28:51,672 
> [org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$601/1151550579@71fbee17]
>  INFO org.apache.ratis.grpc.server.GrpcLogAppender: 
> 6769b0ef-fb4a-4235-bcf0-f1623fcd74cb@group-622E07D885D8->591f38ce-0b11-4003-8528-7a5802f89ea8-GrpcLogAppender:
>  followerNextIndex = 1 but logStartIndex = 992920, notify follower to install 
> snapshot-(t:392, i:992920)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to