Mukul Kumar Singh created RATIS-209:
---------------------------------------
Summary: StateMachine updater may miss writeLog Request after a
new Leader is chosen.
Key: RATIS-209
URL: https://issues.apache.org/jira/browse/RATIS-209
Project: Ratis
Issue Type: Bug
Affects Versions: 0.2.0-alpha
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh
This issue is happening on ozone for write chunk request.
1) currently write chunk request is processed in two phases, in the first phase
the user data is written to the follower as part of {{writeStateMachineData}}
and then the entry is committed to the follower as part of {{commit}}.
2) The issue which is hit right now is the case where a)
{{writeStateMachineData}} didn't happen for a particular chunk however b) the
commit entry is still processed. this leads to a case where a corresponding
stateMachineFuture is not present in the hashmap.
{code}
2018-02-12 00:26:30,097 INFO org.apache.ratis.server.impl.FollowerState:
172.26.32.228_9858 changes to CANDIDATE, lastRpcTime:1773, electionTimeout:873ms
2018-02-12 00:26:30,098 INFO org.apache.ratis.server.impl.RaftServerImpl:
172.26.32.228_9858 changes role from FOLLOWER to CANDIDATE at term 3 for
changeToCandidate
2018-02-12 00:26:30,100 INFO org.apache.ratis.server.impl.RaftServerImpl:
172.26.32.228_9858: change Leader from 172.26.32.232_9858 to null at term 3 for
initElection
2018-02-12 00:26:32,869 INFO org.apache.ratis.server.impl.LeaderElection:
172.26.32.228_9858: begin an election in Term 4
2018-02-12 00:26:32,901 INFO
org.apache.ratis.grpc.server.RaftServerProtocolService: 172.26.32.228_9858:
appendEntries completed
2018-02-12 00:26:33,217 INFO org.apache.ratis.server.impl.LeaderElection:
172.26.32.228_9858: Election REJECTED; received 2 response(s)
[172.26.32.228_9858<-172.26.32.230_9858#0:FAIL-t4,
172.26.32.228_9858<-172.26.32.232_9858#0:FAIL-t4] and 0 exception(s);
172.26.32.228_9858:t4, leader=null, voted=172.26.32.228_9858, raftlog=[(t:3,
i:10711)], conf=[172.26.32.228_9858:172.26.32.228:9858,
172.26.32.230_9858:172.26.32.230:9858, 172.26.32.232_9858:172.26.32.232:9858],
old=null
2018-02-12 00:26:33,217 INFO org.apache.ratis.server.impl.RaftServerImpl:
172.26.32.228_9858 changes role from CANDIDATE to FOLLOWER at term 4 for
changeToFollower
2018-02-12 00:26:39,518 INFO org.apache.ratis.server.impl.FollowerState:
172.26.32.228_9858 changes to CANDIDATE, lastRpcTime:5624, electionTimeout:975ms
2018-02-12 00:26:39,518 INFO org.apache.ratis.server.impl.RaftServerImpl:
172.26.32.228_9858 changes role from FOLLOWER to CANDIDATE at term 5 for
changeToCandidate
2018-02-12 00:26:39,518 INFO org.apache.ratis.server.impl.RaftServerImpl:
172.26.32.228_9858 changes role from CANDIDATE to FOLLOWER at term 5 for
changeToFollower
2018-02-12 00:26:39,520 INFO org.apache.ratis.server.impl.RaftServerImpl:
172.26.32.228_9858: change Leader from null to 172.26.32.232_9858 at term 5 for
appendEntries
{code}
{code}
2018-02-12 00:31:12,400 ERROR org.apache.ratis.server.impl.StateMachineUpdater:
Terminating with exit status 2: StateMachineUpdater-172.26.32.228_9858: the
StateMachineUpdater hits Throwable
java.lang.NullPointerException
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.applyTransaction(ContainerStateMachine.java:254)
at
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1001)
at
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:151)
at java.lang.Thread.run(Thread.java:745)
2018-02-12 00:31:12,406 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at
y128.l42scl.hortonworks.com/172.26.32.228
************************************************************/
*** shutting down gRPC server since JVM is shutting down
*** server shut down
***
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)