[jira] [Comment Edited] (FLINK-23430) Do not take snapshot for operator coordinators which all tasks finished

Dawid Wysakowicz (Jira) Tue, 03 Aug 2021 05:00:06 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-23430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392260#comment-17392260
 ]


Dawid Wysakowicz edited comment on FLINK-23430 at 8/3/21, 11:59 AM:
--------------------------------------------------------------------

I agree it is kind of an optimization. We can just keep snapshotting state of 
such coordinators. We would need to loosen the restriction we added.

> Also I would ask a question, why {{OperatorCoordinator}}s are still running 
> if all of it's operators have finished/closed?

This is a good question, that I don't know the answer. I guess it was not 
implemented so far.

> Also I would ask another question, during recovery, do/should we even start 
> an OperatorCoordinator if all of it's operator have already finished long 
> time ago?

I'd treat that as an optimization as well. Right now, we also start subtasks 
which finished long time ago, but we immediately go to the closing/finishing 
phase for them.


was (Author: dawidwys):
I agree it is kind of an optimization. We can just keep snapshotting state of 
such coordinators.

> Also I would ask a question, why {{OperatorCoordinator}}s are still running 
> if all of it's operators have finished/closed?

This is a good question, that I don't know the answer. I guess it was not 
implemented so far.

> Also I would ask another question, during recovery, do/should we even start 
> an OperatorCoordinator if all of it's operator have already finished long 
> time ago?

I'd treat that as an optimization as well. Right now, we also start subtasks 
which finished long time ago, but we immediately go to the closing/finishing 
phase for them.

> Do not take snapshot for operator coordinators which all tasks finished
> -----------------------------------------------------------------------
>
>                 Key: FLINK-23430
>                 URL: https://issues.apache.org/jira/browse/FLINK-23430
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Checkpointing
>            Reporter: Dawid Wysakowicz
>            Assignee: Dawid Wysakowicz
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Currently we trigger checkpoints for all operator coordinators irrespective 
> if their corresponding tasks finished or not. This leads e.g. to a 
> precondition in 
> {{org.apache.flink.runtime.checkpoint.PendingCheckpoint#fulfillFullyFinishedOperatorStates}}
>  failing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-23430) Do not take snapshot for operator coordinators which all tasks finished

Reply via email to