[ 
https://issues.apache.org/jira/browse/FLINK-14344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960847#comment-16960847
 ] 

Piotr Nowojski commented on FLINK-14344:
----------------------------------------

[~SleePy], as far as me and [~trohrmann] are aware, the only user of this 
feature is [the Pravega connector|https://github.com/pravega/flink-connectors], 
so we can ask them or check the code directly what's the semantic that they 
use/need.

I think the problem is that so far, all of the hooks were de-facto synchronous, 
blocking whole {{CheckpointCoordinator}} on any IO. However this was blocking 
just the {{CheckpointCoordinator}}. After the refactor, it would block whole 
{{JobManager}}, right? To make things more complicated, even if we provide the 
asynchronous hook, there are two possible semantics:
# hook is triggered asynchronously before the checkpoints starts, but 
checkpoint barriers are being sent and the checkpoints starts before async hook 
is completed
# hook is triggered asynchronously, but checkpoint is not started before the 
async action completes

The first one might be more preferable for us, but I could imagine that some 
systems need the second one - to initialise something, before checkpoint is 
started by any of the operators. Again, maybe we can just do some research how 
Pravega is using it?

> Snapshot master hook state asynchronously
> -----------------------------------------
>
>                 Key: FLINK-14344
>                 URL: https://issues.apache.org/jira/browse/FLINK-14344
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Checkpointing
>            Reporter: Biao Liu
>            Assignee: Biao Liu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.10.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently we snapshot the master hook state synchronously. As a part of 
> reworking threading model of {{CheckpointCoordinator}}, we have to make this 
> non-blocking to satisfy the requirement of running in main thread.
> The behavior of snapshotting master hook state is similar to task state 
> snapshotting. Master state snapshotting is taken before task state 
> snapshotting. Because in master hook, there might be external system 
> initialization which task state snapshotting might depend on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to