Matthias Pohl created FLINK-34940: ------------------------------------- Summary: LeaderContender implementations handle invalid state Key: FLINK-34940 URL: https://issues.apache.org/jira/browse/FLINK-34940 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Reporter: Matthias Pohl
Currently, LeaderContender implementations (e.g. see [ResourceManagerServiceImplTest#grantLeadership_withExistingLeader_waitTerminationOfExistingLeader|https://github.com/apache/flink/blob/master/flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/ResourceManagerServiceImplTest.java#L219]) allow the handling of leader events of the same type happening after each other which shouldn't be the case. Two subsequent leadership grants indicate that the leading instance which received the leadership grant again missed the leadership revocation event causing an invalid state of the overall deployment (i.e. split brain scenario). We should fail fatally in these scenarios rather than handling them. -- This message was sent by Atlassian Jira (v8.20.10#820010)