Matthias Pohl created FLINK-34940:
-------------------------------------

             Summary: LeaderContender implementations handle invalid state
                 Key: FLINK-34940
                 URL: https://issues.apache.org/jira/browse/FLINK-34940
             Project: Flink
          Issue Type: Technical Debt
          Components: Runtime / Coordination
            Reporter: Matthias Pohl


Currently, LeaderContender implementations (e.g. see 
[ResourceManagerServiceImplTest#grantLeadership_withExistingLeader_waitTerminationOfExistingLeader|https://github.com/apache/flink/blob/master/flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/ResourceManagerServiceImplTest.java#L219])
 allow the handling of leader events of the same type happening after each 
other which shouldn't be the case.

Two subsequent leadership grants indicate that the leading instance which 
received the leadership grant again missed the leadership revocation event 
causing an invalid state of the overall deployment (i.e. split brain scenario). 
We should fail fatally in these scenarios rather than handling them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to