[ 
https://issues.apache.org/jira/browse/FLINK-13905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964755#comment-16964755
 ] 

Piotr Nowojski commented on FLINK-13905:
----------------------------------------

I'm moving the discussion about this ticket from FLINK-13848, to not clogg it.

[~SleePy]
{quote}
In brief, my solution is introducing a queue of trigger request. If the prior 
trigger request is not finished, the later one (including checkpoint and 
savepoint) will be kept in this queue.
{quote}
So the periodic trigger would, if there is an ongoing chain of A->B->C, will 
just enque a request in this queue, otherwise it would trigger "A". Then we 
also need a manual logic in A, B and C, that if they fail, we re-check the 
queue or if "C" completes successfully, it also rechecks the queue?

Isn't it almost the same logic as scheduling the next checkpoint with a delay 
manually from A, B or C? Without the need for FLINK-13848? Side note, haven't 
you implemented something similar or exactly this in one of the PRs, in a 
commit that was ultimately dropped?

In the end, what do you think would be an easier/cleaner/better approach to 
solve this? 

> Separate checkpoint triggering into stages
> ------------------------------------------
>
>                 Key: FLINK-13905
>                 URL: https://issues.apache.org/jira/browse/FLINK-13905
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Checkpointing
>            Reporter: Biao Liu
>            Assignee: Biao Liu
>            Priority: Major
>             Fix For: 1.10.0
>
>
> Currently {{CheckpointCoordinator#triggerCheckpoint}} includes some heavy IO 
> operations. We plan to separate the triggering into different stages. The IO 
> operations are executed in IO threads, while other on-memory operations are 
> not.
> This is a preparation for making all on-memory operations of 
> {{CheckpointCoordinator}} single threaded (in main thread).
> Note that we could not put on-memory operations of triggering into main 
> thread directly now. Because there are still some operations on a heavy lock 
> (coordinator-wide).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to