[
https://issues.apache.org/jira/browse/RATIS-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407678#comment-16407678
]
Tsz Wo Nicholas Sze commented on RATIS-209:
-------------------------------------------
> ... The appendEntries are async, therefore the follower can append the
> entries in future and return the next index. ...
appendEntries returned a combined future; see RaftLogWorker.WriteLog. So that
the stateMachineFuture must be completed when combined future is completed.
> StateMachine updater may miss writeLog Request after a new Leader is chosen.
> ----------------------------------------------------------------------------
>
> Key: RATIS-209
> URL: https://issues.apache.org/jira/browse/RATIS-209
> Project: Ratis
> Issue Type: Bug
> Affects Versions: 0.2.0-alpha
> Reporter: Mukul Kumar Singh
> Assignee: Mukul Kumar Singh
> Priority: Major
>
> This issue is happening on ozone for write chunk request.
> 1) currently write chunk request is processed in two phases, in the first
> phase the user data is written to the follower as part of
> {{writeStateMachineData}} and then the entry is committed to the follower as
> part of {{commit}}.
> 2) The issue which is hit right now is the case where a)
> {{writeStateMachineData}} didn't happen for a particular chunk however b) the
> commit entry is still processed. this leads to a case where a corresponding
> stateMachineFuture is not present in the hashmap.
> {code}
> 2018-02-12 00:26:30,097 INFO org.apache.ratis.server.impl.FollowerState:
> 172.26.32.228_9858 changes to CANDIDATE, lastRpcTime:1773,
> electionTimeout:873ms
> 2018-02-12 00:26:30,098 INFO org.apache.ratis.server.impl.RaftServerImpl:
> 172.26.32.228_9858 changes role from FOLLOWER to CANDIDATE at term 3 for
> changeToCandidate
> 2018-02-12 00:26:30,100 INFO org.apache.ratis.server.impl.RaftServerImpl:
> 172.26.32.228_9858: change Leader from 172.26.32.232_9858 to null at term 3
> for initElection
> 2018-02-12 00:26:32,869 INFO org.apache.ratis.server.impl.LeaderElection:
> 172.26.32.228_9858: begin an election in Term 4
> 2018-02-12 00:26:32,901 INFO
> org.apache.ratis.grpc.server.RaftServerProtocolService: 172.26.32.228_9858:
> appendEntries completed
> 2018-02-12 00:26:33,217 INFO org.apache.ratis.server.impl.LeaderElection:
> 172.26.32.228_9858: Election REJECTED; received 2 response(s)
> [172.26.32.228_9858<-172.26.32.230_9858#0:FAIL-t4,
> 172.26.32.228_9858<-172.26.32.232_9858#0:FAIL-t4] and 0 exception(s);
> 172.26.32.228_9858:t4, leader=null, voted=172.26.32.228_9858, raftlog=[(t:3,
> i:10711)], conf=[172.26.32.228_9858:172.26.32.228:9858,
> 172.26.32.230_9858:172.26.32.230:9858,
> 172.26.32.232_9858:172.26.32.232:9858], old=null
> 2018-02-12 00:26:33,217 INFO org.apache.ratis.server.impl.RaftServerImpl:
> 172.26.32.228_9858 changes role from CANDIDATE to FOLLOWER at term 4 for
> changeToFollower
> 2018-02-12 00:26:39,518 INFO org.apache.ratis.server.impl.FollowerState:
> 172.26.32.228_9858 changes to CANDIDATE, lastRpcTime:5624,
> electionTimeout:975ms
> 2018-02-12 00:26:39,518 INFO org.apache.ratis.server.impl.RaftServerImpl:
> 172.26.32.228_9858 changes role from FOLLOWER to CANDIDATE at term 5 for
> changeToCandidate
> 2018-02-12 00:26:39,518 INFO org.apache.ratis.server.impl.RaftServerImpl:
> 172.26.32.228_9858 changes role from CANDIDATE to FOLLOWER at term 5 for
> changeToFollower
> 2018-02-12 00:26:39,520 INFO org.apache.ratis.server.impl.RaftServerImpl:
> 172.26.32.228_9858: change Leader from null to 172.26.32.232_9858 at term 5
> for appendEntries
> {code}
> {code}
> 2018-02-12 00:31:12,400 ERROR
> org.apache.ratis.server.impl.StateMachineUpdater: Terminating with exit
> status 2: StateMachineUpdater-172.26.32.228_9858: the StateMachineUpdater
> hits Throwable
> java.lang.NullPointerException
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.applyTransaction(ContainerStateMachine.java:254)
> at
> org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1001)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:151)
> at java.lang.Thread.run(Thread.java:745)
> 2018-02-12 00:31:12,406 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down DataNode at
> y128.l42scl.hortonworks.com/172.26.32.228
> ************************************************************/
> *** shutting down gRPC server since JVM is shutting down
> *** server shut down
> ***
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)