[
https://issues.apache.org/jira/browse/FLINK-23430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392260#comment-17392260
]
Dawid Wysakowicz edited comment on FLINK-23430 at 8/3/21, 11:59 AM:
--------------------------------------------------------------------
I agree it is kind of an optimization. We can just keep snapshotting state of
such coordinators. We would need to loosen the restriction we added.
> Also I would ask a question, why {{OperatorCoordinator}}s are still running
> if all of it's operators have finished/closed?
This is a good question, that I don't know the answer. I guess it was not
implemented so far.
> Also I would ask another question, during recovery, do/should we even start
> an OperatorCoordinator if all of it's operator have already finished long
> time ago?
I'd treat that as an optimization as well. Right now, we also start subtasks
which finished long time ago, but we immediately go to the closing/finishing
phase for them.
was (Author: dawidwys):
I agree it is kind of an optimization. We can just keep snapshotting state of
such coordinators.
> Also I would ask a question, why {{OperatorCoordinator}}s are still running
> if all of it's operators have finished/closed?
This is a good question, that I don't know the answer. I guess it was not
implemented so far.
> Also I would ask another question, during recovery, do/should we even start
> an OperatorCoordinator if all of it's operator have already finished long
> time ago?
I'd treat that as an optimization as well. Right now, we also start subtasks
which finished long time ago, but we immediately go to the closing/finishing
phase for them.
> Do not take snapshot for operator coordinators which all tasks finished
> -----------------------------------------------------------------------
>
> Key: FLINK-23430
> URL: https://issues.apache.org/jira/browse/FLINK-23430
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Checkpointing
> Reporter: Dawid Wysakowicz
> Assignee: Dawid Wysakowicz
> Priority: Major
> Fix For: 1.14.0
>
>
> Currently we trigger checkpoints for all operator coordinators irrespective
> if their corresponding tasks finished or not. This leads e.g. to a
> precondition in
> {{org.apache.flink.runtime.checkpoint.PendingCheckpoint#fulfillFullyFinishedOperatorStates}}
> failing
--
This message was sent by Atlassian Jira
(v8.3.4#803005)