Swaminathan Balachandran created HDDS-12236:
-----------------------------------------------

             Summary: ContainerStateMachine should not apply future 
transactions in the event of failure
                 Key: HDDS-12236
                 URL: https://issues.apache.org/jira/browse/HDDS-12236
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Swaminathan Balachandran
            Assignee: Swaminathan Balachandran


Currently when an apply transaction fails on ContainerStateMachine, the 
[isStateMachineHealthy|https://github.com/apache/ozone/blob/bf19af946aa08d2bea5064d79513a196b9bbf646/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L568]
 flag is set to false. However when the next set of transactions are applied 
this flag is not checked and all the transactions in the pipeline can get 
applied to the statemachine which could potentially bring down the statemachine 
in an inconsistent state. For instance if the write chunk fails and the next 
putBlock transaction succeeds then this would mean that the container is in an 
inconsistent state. 

This flag if healthy is used to update the applyTransactionIndex persisted on 
disk otherwise the applyTransactionIndex is not updated. So this would mean 
that if a containerStateMachine goes into an unhealthy state, transactions 
would be replayed from the point of failures on restarts which could bring in a 
lot of inconsitencies if all the transactions are not idempotent(To ensure this 
we would have to go through all the different kind of operations performed on 
datanode and validate this). An easier fix would be to check if the 
stateMachineHealthyFlag on the beginning of every applyTransaction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to