[ 
https://issues.apache.org/jira/browse/FLINK-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-16986.
----------------------------------
    Resolution: Fixed

Fixed in 

1.11.0 via
  - 05d0b928f0dc296d69d43eac353f74d2a0077f7a
  - 32c51c96880400fdd3ded0292b09e5c8d8ea28bb
  - 279e7ef678fc57a3ea4d014fb0248527da2a84d6
  - a4db69b2b2f6a7be8943666d944b4cec07f59f56
  - d671455cf54e141648a97f09880fe29926763753

1.12.0 (master) via
  - 1a721d855998a7c91415312d4890d7d4f916e163
  - b233aa84a1b0d8354211506219d0f785bccae870
  - 7ceee2395330c9c723ae83a85de2cf44f8378efc
  - 37f7db38290df065669178cc6407edd5055f1951
  - 5926e07f01f6c91a8a0265e5f4b30086572d8125


> Enhance the OperatorEvent handling guarantee during checkpointing.
> ------------------------------------------------------------------
>
>                 Key: FLINK-16986
>                 URL: https://issues.apache.org/jira/browse/FLINK-16986
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Checkpointing
>            Reporter: Jiangjie Qin
>            Assignee: Stephan Ewen
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.11.0
>
>
> When the {{CheckpointCoordinator}} takes a checkpoint, the checkpointing 
> order is following:
>  # {{CheckpointCoordinator}} triggers checkpoint on each 
> {{OperatorCoordinator}}
>  # Each {{OperatorCoordinator}} takes a snapshot.
>  # Right after taking the snapshot, the {{CheckpointCoordinator}} sends a 
> {{CHECKPOINT_FIN}} marker through the {{OperatorContext}}.
>  # Once the {{OperatorContext}} sees {{CHECKPOINT_FIN}} marker, it will wait 
> for all the previous events are acked and suspend the event gateway to the 
> operators by buffering the future {{OperatorEvents}} sent from the 
> {{OperatorCoordinator}} to the operators without actually sending them out.
>  # The {{CheckpointCoordinator}} waits until all the {{OperatorCoordinator}}s 
> finish step 2-4 and then triggers the task snapshots.
>  # The suspension of an event gateway to an operator can be lifted after all 
> the subtasks of that operator has finished their task checkpoint.
> The mechanism above guarantees all the {{OperatorEvents}} sent before taking 
> the operator coordinator snapshot are handled by the operator before the task 
> snapshots are taken.
> An operator can use this mechanism to know whether an {{OperatorEvent}} it 
> sent to the coordinator is included in the upcoming checkpoint or not. What 
> it has to do is to ask the operator coordinator to ACK that OperatorEvent. If 
> the ACK is received before the operator takes the next snapshot, that 
> OperatorEvent must have been handled and checkpointed by the 
> OperatorCoordinator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to