Stephan Ewen created FLINK-18429:
------------------------------------
Summary: Add default method for
CheckpointListener.notifyCheckpointAborted(checkpointId)
Key: FLINK-18429
URL: https://issues.apache.org/jira/browse/FLINK-18429
Project: Flink
Issue Type: Bug
Components: API / DataStream
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Fix For: 1.11.0
The {{CheckpointListener}} interface is implemented by many users. Adding a new
method {{notifyCheckpointAborted(long)}} to the interface without a default
method breaks many user programs.
We should turn this method into a default method:
- Avoid breaking programs
- It is in practice less relevant for programs to react to checkpoints being
aborted then to being completed. The reason is that on completion you often
want to commit side-effects, while on abortion you frequently do not do
anything, but let the next successful checkpoint commit all changes up to then.
*Original Confusion*
There was confusion about this originally, going back to a comment by myself
suggesting this should not be a default method, incorrectly thinking of it as
an internal interface:
https://github.com/apache/flink/pull/8693#issuecomment-542834147
See also clarification email on the mailing list:
{noformat}
About the "notifyCheckpointAborted()":
When I wrote that comment, I was (apparently wrongly) assuming we were talking
about an internal interface here, because the "abort" signal was originally
only intended to cancel the async part of state backend checkpoints.
I just realized that this is exposed to users - and I am actually with Thomas
on this one. The "CheckpointListener" is a very public interface that many
users implement. The fact that it is tagged "@PublicEvolving" is somehow not
aligned with reality. So adding the method here will in reality break lots and
lots of user programs.
I think also in practice it is much less relevant for user applications to
react to aborted checkpoints. Since the notifications there can not be relied
upon (if there is a task failure concurrently) users always have to follow the
"newer checkpoint subsumes older checkpoint" contract, so the abort method is
probably rarely relevant.
This is something we should change, in my opinion.
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)