Swaminathan Balachandran created HDDS-12236:
-----------------------------------------------
Summary: ContainerStateMachine should not apply future
transactions in the event of failure
Key: HDDS-12236
URL: https://issues.apache.org/jira/browse/HDDS-12236
Project: Apache Ozone
Issue Type: Bug
Reporter: Swaminathan Balachandran
Assignee: Swaminathan Balachandran
Currently when an apply transaction fails on ContainerStateMachine, the
[isStateMachineHealthy|https://github.com/apache/ozone/blob/bf19af946aa08d2bea5064d79513a196b9bbf646/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L568]
flag is set to false. However when the next set of transactions are applied
this flag is not checked and all the transactions in the pipeline can get
applied to the statemachine which could potentially bring down the statemachine
in an inconsistent state. For instance if the write chunk fails and the next
putBlock transaction succeeds then this would mean that the container is in an
inconsistent state.
This flag if healthy is used to update the applyTransactionIndex persisted on
disk otherwise the applyTransactionIndex is not updated. So this would mean
that if a containerStateMachine goes into an unhealthy state, transactions
would be replayed from the point of failures on restarts which could bring in a
lot of inconsitencies if all the transactions are not idempotent(To ensure this
we would have to go through all the different kind of operations performed on
datanode and validate this). An easier fix would be to check if the
stateMachineHealthyFlag on the beginning of every applyTransaction.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]