[
https://issues.apache.org/jira/browse/FLINK-14344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949578#comment-16949578
]
Biao Liu commented on FLINK-14344:
----------------------------------
Hi [~pnowojski], thanks for feedback.
Regarding to the current code, yes, you are right. It confuses me as well.
There is a comment before the waiting part of
{{MasterHooks#triggerMasterHooks}}. It says "in the future we want to make this
asynchronous with futures (no pun intended)". I think it means it should be
asynchronous, but it's just not to be done for some reason. So I guess it
should be asynchronous by design.
{quote}I'm not sure why should we execute the hooks in the IO Executor, why not
master thread?
{quote}
Actually at the very beginning, I think it should be executed in main thread.
However I found there is a comment of
{{MasterTriggerRestoreHook#triggerCheckpoint}}, "If the action by this hook
needs to be executed synchronously, then this method should directly execute
the action synchronously and block until it is complete". Based on this
description, I'm afraid there might be a risk of executing a blocking operation
in main thread. So I try to execute it in IO thread, but there is another risk
of deadlock.
It's hard to make a right decision without a clear semantics. I have started a
survey in both dev and user mailing list to find out how develops and users use
this interface. There is no response yet. I guess there are not too many users
depending on it.
If there is no opposite opinion in the next few days, I intend to treat it
asynchronous, executing it in main thread. But the comment needs to be changed,
removing the comment of "blocking until it is complete" and emphasizing in
comment that it should be non-blocking.
If there are some opposite voices, and we couldn't reach an agreement. Then we
could make it synchronous by changing the signature of method (removing the
executor and completable future).
What do you think?
> Snapshot master hook state asynchronously
> -----------------------------------------
>
> Key: FLINK-14344
> URL: https://issues.apache.org/jira/browse/FLINK-14344
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Checkpointing
> Reporter: Biao Liu
> Assignee: Biao Liu
> Priority: Major
> Fix For: 1.10.0
>
>
> Currently we snapshot the master hook state synchronously. As a part of
> reworking threading model of {{CheckpointCoordinator}}, we have to make this
> non-blocking to satisfy the requirement of running in main thread.
> The behavior of snapshotting master hook state is similar to task state
> snapshotting. Master state snapshotting is taken before task state
> snapshotting. Because in master hook, there might be external system
> initialization which task state snapshotting might depend on.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)