[
https://issues.apache.org/jira/browse/FLINK-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963560#comment-16963560
]
Congxian Qiu(klion26) edited comment on FLINK-13969 at 10/31/19 2:26 AM:
-------------------------------------------------------------------------
Hi, [~trohrmann], from the log I think the event order is such as below:
* triggerCheckpoint eager pre-checks
* job cancel
* triggerCheckpoint actual trigger checkpoint
all the three steps are guarded by lock, will release the lock after step 1 and
require the lock in step 3.
we checked whether the coordinator has been stopped in eager pre-checks, but
not in actual trigger checkpoint.
was (Author: klion26):
Hi, [~trohrmann], from the log I think the event order is such as below:
* triggerCheckpoint eager pre-checks
* job cancel
* triggerCheckpoint actual trigger checkpoint
we checked whether the coordinator has been stopped in eager pre-checks, but
not in actual trigger checkpoint.
> Resuming Externalized Checkpoint (rocks, incremental, scale down) end-to-end
> test fails on Travis
> -------------------------------------------------------------------------------------------------
>
> Key: FLINK-13969
> URL: https://issues.apache.org/jira/browse/FLINK-13969
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.10.0
> Reporter: Till Rohrmann
> Priority: Critical
> Labels: test-stability
> Fix For: 1.10.0
>
>
> The {{Resuming Externalized Checkpoint (rocks, incremental, scale down)}}
> end-to-end test fails on Travis because its log contains an exception
> {code}
> org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete
> snapshot 16 for operator ArtificalKeyedStateMapper_Avro ->
> ArtificalOperatorStateMapper (2/4). Failure reason: Checkpoint was declined.
> at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:431)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1302)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1236)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:892)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:797)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:728)
> at
> org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:88)
> at
> org.apache.flink.streaming.runtime.io.CheckpointBarrierAligner.processBarrier(CheckpointBarrierAligner.java:177)
> at
> org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:155)
> at
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.pollNextNullable(StreamTaskNetworkInput.java:118)
> at
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.pollNextNullable(StreamTaskNetworkInput.java:48)
> at
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:144)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.performDefaultAction(StreamTask.java:277)
> at
> org.apache.flink.streaming.runtime.tasks.mailbox.execution.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:147)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:404)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Cannot register Closeable, registry is
> already closed. Closing argument.
> at
> org.apache.flink.util.AbstractCloseableRegistry.registerCloseable(AbstractCloseableRegistry.java:85)
> at
> org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.<init>(AsyncSnapshotCallable.java:122)
> at
> org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.<init>(AsyncSnapshotCallable.java:110)
> at
> org.apache.flink.runtime.state.AsyncSnapshotCallable.toAsyncSnapshotFutureTask(AsyncSnapshotCallable.java:104)
> at
> org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.doSnapshot(RocksIncrementalSnapshotStrategy.java:170)
> at
> org.apache.flink.contrib.streaming.state.snapshot.RocksDBSnapshotStrategyBase.snapshot(RocksDBSnapshotStrategyBase.java:126)
> at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.snapshot(RocksDBKeyedStateBackend.java:439)
> at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:411)
> ... 17 more
> {code}
> https://api.travis-ci.org/v3/job/580915660/log.txt
--
This message was sent by Atlassian Jira
(v8.3.4#803005)