Mukul Kumar Singh created RATIS-209:
---------------------------------------

             Summary: StateMachine updater may miss writeLog Request after a 
new Leader is chosen.
                 Key: RATIS-209
                 URL: https://issues.apache.org/jira/browse/RATIS-209
             Project: Ratis
          Issue Type: Bug
    Affects Versions: 0.2.0-alpha
            Reporter: Mukul Kumar Singh
            Assignee: Mukul Kumar Singh


This issue is happening on ozone for write chunk request.

1) currently write chunk request is processed in two phases, in the first phase 
the user data is written to the follower as part of {{writeStateMachineData}} 
and then the entry is committed to the follower as part of {{commit}}.

2) The issue which is hit right now is the case where a) 
{{writeStateMachineData}} didn't happen for a particular chunk however b) the 
commit entry is still processed. this leads to a case where a corresponding 
stateMachineFuture is not present in the hashmap.

{code}
2018-02-12 00:26:30,097 INFO org.apache.ratis.server.impl.FollowerState: 
172.26.32.228_9858 changes to CANDIDATE, lastRpcTime:1773, electionTimeout:873ms
2018-02-12 00:26:30,098 INFO org.apache.ratis.server.impl.RaftServerImpl: 
172.26.32.228_9858 changes role from FOLLOWER to CANDIDATE at term 3 for 
changeToCandidate
2018-02-12 00:26:30,100 INFO org.apache.ratis.server.impl.RaftServerImpl: 
172.26.32.228_9858: change Leader from 172.26.32.232_9858 to null at term 3 for 
initElection
2018-02-12 00:26:32,869 INFO org.apache.ratis.server.impl.LeaderElection: 
172.26.32.228_9858: begin an election in Term 4
2018-02-12 00:26:32,901 INFO 
org.apache.ratis.grpc.server.RaftServerProtocolService: 172.26.32.228_9858: 
appendEntries completed
2018-02-12 00:26:33,217 INFO org.apache.ratis.server.impl.LeaderElection: 
172.26.32.228_9858: Election REJECTED; received 2 response(s) 
[172.26.32.228_9858<-172.26.32.230_9858#0:FAIL-t4, 
172.26.32.228_9858<-172.26.32.232_9858#0:FAIL-t4] and 0 exception(s); 
172.26.32.228_9858:t4, leader=null, voted=172.26.32.228_9858, raftlog=[(t:3, 
i:10711)], conf=[172.26.32.228_9858:172.26.32.228:9858, 
172.26.32.230_9858:172.26.32.230:9858, 172.26.32.232_9858:172.26.32.232:9858], 
old=null
2018-02-12 00:26:33,217 INFO org.apache.ratis.server.impl.RaftServerImpl: 
172.26.32.228_9858 changes role from CANDIDATE to FOLLOWER at term 4 for 
changeToFollower
2018-02-12 00:26:39,518 INFO org.apache.ratis.server.impl.FollowerState: 
172.26.32.228_9858 changes to CANDIDATE, lastRpcTime:5624, electionTimeout:975ms
2018-02-12 00:26:39,518 INFO org.apache.ratis.server.impl.RaftServerImpl: 
172.26.32.228_9858 changes role from FOLLOWER to CANDIDATE at term 5 for 
changeToCandidate
2018-02-12 00:26:39,518 INFO org.apache.ratis.server.impl.RaftServerImpl: 
172.26.32.228_9858 changes role from CANDIDATE to FOLLOWER at term 5 for 
changeToFollower
2018-02-12 00:26:39,520 INFO org.apache.ratis.server.impl.RaftServerImpl: 
172.26.32.228_9858: change Leader from null to 172.26.32.232_9858 at term 5 for 
appendEntries
{code}


{code}
2018-02-12 00:31:12,400 ERROR org.apache.ratis.server.impl.StateMachineUpdater: 
Terminating with exit status 2: StateMachineUpdater-172.26.32.228_9858: the 
StateMachineUpdater hits Throwable
java.lang.NullPointerException
        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.applyTransaction(ContainerStateMachine.java:254)
        at 
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1001)
        at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:151)
        at java.lang.Thread.run(Thread.java:745)
2018-02-12 00:31:12,406 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at 
y128.l42scl.hortonworks.com/172.26.32.228
************************************************************/


*** shutting down gRPC server since JVM is shutting down
*** server shut down
***
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to