[jira] [Assigned] (FLINK-17350) StreamTask should always fail immediately on failures in synchronous part of a checkpoint

Piotr Nowojski (Jira) Tue, 12 May 2020 05:21:00 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-17350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Piotr Nowojski reassigned FLINK-17350:
--------------------------------------

    Assignee: Piotr Nowojski

> StreamTask should always fail immediately on failures in synchronous part of 
> a checkpoint
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-17350
>                 URL: https://issues.apache.org/jira/browse/FLINK-17350
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing, Runtime / Task
>    Affects Versions: 1.6.4, 1.7.2, 1.8.3, 1.9.2, 1.10.0
>            Reporter: Piotr Nowojski
>            Assignee: Piotr Nowojski
>            Priority: Critical
>             Fix For: 1.11.0
>
>
> This bugs also Affects 1.5.x branch.
> As described in point 1 here: 
> https://issues.apache.org/jira/browse/FLINK-17327?focusedCommentId=17090576&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17090576
> {{setTolerableCheckpointFailureNumber(...)}} and its deprecated 
> {{setFailTaskOnCheckpointError(...)}} predecessor are implemented 
> incorrectly. Since Flink 1.5 
> (https://issues.apache.org/jira/browse/FLINK-4809) they can lead to operators 
> (and especially sinks with an external state) end up in an inconsistent 
> state. That's also true even if they are not used, because of another issue: 
> FLINK-17351
> If we mix this with intermittent external system failure. Sink reports an 
> exception, transaction was lost/aborted, Sink is in failed state, but if 
> there will be a happy coincidence that it manages to accept further records, 
> this exception can be lost and all of the records in those failed checkpoints 
> will be lost forever as well.
> For details please check FLINK-17327.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (FLINK-17350) StreamTask should always fail immediately on failures in synchronous part of a checkpoint

Reply via email to