[
https://issues.apache.org/jira/browse/FLINK-17350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Piotr Nowojski reassigned FLINK-17350:
--------------------------------------
Assignee: Piotr Nowojski
> StreamTask should always fail immediately on failures in synchronous part of
> a checkpoint
> -----------------------------------------------------------------------------------------
>
> Key: FLINK-17350
> URL: https://issues.apache.org/jira/browse/FLINK-17350
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing, Runtime / Task
> Affects Versions: 1.6.4, 1.7.2, 1.8.3, 1.9.2, 1.10.0
> Reporter: Piotr Nowojski
> Assignee: Piotr Nowojski
> Priority: Critical
> Fix For: 1.11.0
>
>
> This bugs also Affects 1.5.x branch.
> As described in point 1 here:
> https://issues.apache.org/jira/browse/FLINK-17327?focusedCommentId=17090576&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17090576
> {{setTolerableCheckpointFailureNumber(...)}} and its deprecated
> {{setFailTaskOnCheckpointError(...)}} predecessor are implemented
> incorrectly. Since Flink 1.5
> (https://issues.apache.org/jira/browse/FLINK-4809) they can lead to operators
> (and especially sinks with an external state) end up in an inconsistent
> state. That's also true even if they are not used, because of another issue:
> FLINK-17351
> If we mix this with intermittent external system failure. Sink reports an
> exception, transaction was lost/aborted, Sink is in failed state, but if
> there will be a happy coincidence that it manages to accept further records,
> this exception can be lost and all of the records in those failed checkpoints
> will be lost forever as well.
> For details please check FLINK-17327.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)