[
https://issues.apache.org/jira/browse/FLINK-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Piotr Nowojski closed FLINK-10377.
----------------------------------
Fix Version/s: 1.9.2
1.8.3
1.10.0
Resolution: Fixed
Merged to master as {{33f5e330491ca979caedbd033418a17134f22cb6}}
to release-1.9 as {{cf82a145921d08fd8ac6e76d57b50060056d0f47}}
to release-1.8 as {{aa92ec568ce0ebb8c2e71d0d6069668a06fb91fb}}
> Remove precondition in TwoPhaseCommitSinkFunction.notifyCheckpointComplete
> --------------------------------------------------------------------------
>
> Key: FLINK-10377
> URL: https://issues.apache.org/jira/browse/FLINK-10377
> Project: Flink
> Issue Type: Bug
> Components: Connectors / Common
> Affects Versions: 1.5.0, 1.6.0
> Reporter: Stefan Richter
> Assignee: Stefan Richter
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.10.0, 1.8.3, 1.9.2
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> The precondition {{checkState(pendingTransactionIterator.hasNext(),
> "checkpoint completed, but no transaction pending");}} in
> {{TwoPhaseCommitSinkFunction.notifyCheckpointComplete()}} seems too strict,
> because checkpoints can overtake checkpoints and will fail the precondition.
> In this case the commit was already performed by the first notification and
> subsumes the late checkpoint. I think the check can be removed.
> edit:
> As [reported by a user on the user mailing
> list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/UNCHECKED-Error-while-confirming-Checkpoint-td23213.html],
> {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} can fail with the
> following exception:
> {noformat}
> java.lang.RuntimeException: Error while confirming checkpoint
> at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1205)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: checkpoint completed, but no
> transaction pending
> at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
> at
> org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.notifyCheckpointComplete(TwoPhaseCommitSinkFunction.java:267)
> at
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyCheckpointComplete(AbstractUdfStreamOperator.java:130)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:822)
> at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1200)
> ... 5 more
> {noformat}
> This can happen in the following scenario:
> # savepoint is triggered
> # checkpoint is triggered
> # checkpoint completes (but it doesn't subsume the savepoint, because
> checkpoints subsume only other checkpoints).
> # savepoint completes
> In this case, {{TwoPhaseCommitSinkFunction}} receives first notification that
> the later checkpoint completed, it commits both savepoint and the checkpoint.
> Later when savepoint {{notifyCheckpointComplete}} arrives, the above error
> will occur.
> Possible trivial fix is to remove that failing {{checkState}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)