[
https://issues.apache.org/jira/browse/FLINK-26029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jiangjie Qin reassigned FLINK-26029:
------------------------------------
Assignee: Dong Lin
> Generalize the checkpoint protocol of OperatorCoordinator.
> ----------------------------------------------------------
>
> Key: FLINK-26029
> URL: https://issues.apache.org/jira/browse/FLINK-26029
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Checkpointing
> Affects Versions: 1.14.3
> Reporter: Jiangjie Qin
> Assignee: Dong Lin
> Priority: Major
> Labels: extensibility
> Fix For: 1.16.0
>
>
> Currently the JM opens all the event valves from the OperatorCoordinator to
> the subtasks after the checkpoint barriers are sent to the Source subtasks.
> While this works for the Source Operators, it unnecessarily limits general
> usage of the OperatorCoordinator for other operators.
> To generalize the protocol, we can change the JM to open the event valve of
> the subtasks that have finished the local checkpoint. So the protocol would
> become following:
> # Let the OC finish processing all the incoming OperatorEvents before the
> snapshot.
> # Wait until all the outgoing OperatorEvents before the snapshot are sent
> and acked.
> # Shut the event valve so no outgoing events can be sent to the subtasks.
> # Send checkpoint barriers to the Source operators.
> # Open the corresponding event valve of a subtask when the
> AcknowledgeCheckpoint messages from that subtask is received.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)