Mukul Kumar Singh created HDFS-13024:
----------------------------------------
Summary: Ozone: ContainerStateMachine should synchronize
operations between createContainer op and writeChunk
Key: HDFS-13024
URL: https://issues.apache.org/jira/browse/HDFS-13024
Project: Hadoop HDFS
Issue Type: Sub-task
Components: ozone
Affects Versions: HDFS-7240
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh
Fix For: HDFS-7240
This issue happens after HDFS-12853. with HDFS-12853, writeChunk op has been
divided into two stages 1) the log write phase (here the state machine data is
written) 2) ApplyTransaction.
With a 3 node ratis ring, ratis leader will append the log entry to its log and
forward it to its followers. However there is no guarantee on when the
followers will apply the log to the state machine in {{applyTransaction}}.
This issue happens in the following order
1) Leader accepts create container
2) Leader add entries to its logs and forwards to followers
3) Followers append the entry to its log and Ack to the raft leader (Please
note that the transaction still hasn't been applied)
4) Leader applies the transaction and now replies
5) write chunk call is sent to the Leader
6) Leader now forwards the call to the followers
7) Followers try to apply the log by calling {{Dispatcher#dispatch}} however
the create container call in 3) still hasn't been applied
8) write chunk call on followers fail.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]