[ 
https://issues.apache.org/jira/browse/FLINK-14344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949578#comment-16949578
 ] 

Biao Liu commented on FLINK-14344:
----------------------------------

Hi [~pnowojski], thanks for feedback.

Regarding to the current code, yes, you are right. It confuses me as well. 
There is a comment before the waiting part of 
{{MasterHooks#triggerMasterHooks}}. It says "in the future we want to make this 
asynchronous with futures (no pun intended)". I think it means it should be 
asynchronous, but it's just not to be done for some reason. So I guess it 
should be asynchronous by design.
{quote}I'm not sure why should we execute the hooks in the IO Executor, why not 
master thread?
{quote}
Actually at the very beginning, I think it should be executed in main thread. 
However I found there is a comment of 
{{MasterTriggerRestoreHook#triggerCheckpoint}}, "If the action by this hook 
needs to be executed synchronously, then this method should directly execute 
the action synchronously and block until it is complete". Based on this 
description, I'm afraid there might be a risk of executing a blocking operation 
in main thread. So I try to execute it in IO thread, but there is another risk 
of deadlock.

It's hard to make a right decision without a clear semantics. I have started a 
survey in both dev and user mailing list to find out how develops and users use 
this interface. There is no response yet. I guess there are not too many users 
depending on it.

If there is no opposite opinion in the next few days, I intend to treat it 
asynchronous, executing it in main thread. But the comment needs to be changed, 
removing the comment of "blocking until it is complete" and emphasizing in 
comment that it should be non-blocking.

If there are some opposite voices, and we couldn't reach an agreement. Then we 
could make it synchronous by changing the signature of method (removing the 
executor and completable future).

What do you think?

> Snapshot master hook state asynchronously
> -----------------------------------------
>
>                 Key: FLINK-14344
>                 URL: https://issues.apache.org/jira/browse/FLINK-14344
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Checkpointing
>            Reporter: Biao Liu
>            Assignee: Biao Liu
>            Priority: Major
>             Fix For: 1.10.0
>
>
> Currently we snapshot the master hook state synchronously. As a part of 
> reworking threading model of {{CheckpointCoordinator}}, we have to make this 
> non-blocking to satisfy the requirement of running in main thread.
> The behavior of snapshotting master hook state is similar to task state 
> snapshotting. Master state snapshotting is taken before task state 
> snapshotting. Because in master hook, there might be external system 
> initialization which task state snapshotting might depend on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to