[ 
https://issues.apache.org/jira/browse/FLINK-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963560#comment-16963560
 ] 

Congxian Qiu(klion26) edited comment on FLINK-13969 at 10/31/19 2:26 AM:
-------------------------------------------------------------------------

Hi, [~trohrmann], from the log I think the event order is such as below:
 * triggerCheckpoint  eager pre-checks
 * job cancel
 * triggerCheckpoint actual trigger checkpoint

all the three steps are guarded by lock, will release the lock after step 1 and 
require the lock in step 3.

we checked whether the coordinator has been stopped in eager pre-checks, but 
not in actual trigger checkpoint.


was (Author: klion26):
Hi, [~trohrmann], from the log I think the event order is such as below:
 * triggerCheckpoint  eager pre-checks
 * job cancel
 * triggerCheckpoint actual trigger checkpoint

we checked whether the coordinator has been stopped in eager pre-checks, but 
not in actual trigger checkpoint.

> Resuming Externalized Checkpoint (rocks, incremental, scale down) end-to-end 
> test fails on Travis
> -------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-13969
>                 URL: https://issues.apache.org/jira/browse/FLINK-13969
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.10.0
>            Reporter: Till Rohrmann
>            Priority: Critical
>              Labels: test-stability
>             Fix For: 1.10.0
>
>
> The {{Resuming Externalized Checkpoint (rocks, incremental, scale down)}} 
> end-to-end test fails on Travis because its log contains an exception
> {code}
> org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete 
> snapshot 16 for operator ArtificalKeyedStateMapper_Avro -> 
> ArtificalOperatorStateMapper (2/4). Failure reason: Checkpoint was declined.
>       at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:431)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1302)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1236)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:892)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:797)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:728)
>       at 
> org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:88)
>       at 
> org.apache.flink.streaming.runtime.io.CheckpointBarrierAligner.processBarrier(CheckpointBarrierAligner.java:177)
>       at 
> org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:155)
>       at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.pollNextNullable(StreamTaskNetworkInput.java:118)
>       at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.pollNextNullable(StreamTaskNetworkInput.java:48)
>       at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:144)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performDefaultAction(StreamTask.java:277)
>       at 
> org.apache.flink.streaming.runtime.tasks.mailbox.execution.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:147)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:404)
>       at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>       at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Cannot register Closeable, registry is 
> already closed. Closing argument.
>       at 
> org.apache.flink.util.AbstractCloseableRegistry.registerCloseable(AbstractCloseableRegistry.java:85)
>       at 
> org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.<init>(AsyncSnapshotCallable.java:122)
>       at 
> org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.<init>(AsyncSnapshotCallable.java:110)
>       at 
> org.apache.flink.runtime.state.AsyncSnapshotCallable.toAsyncSnapshotFutureTask(AsyncSnapshotCallable.java:104)
>       at 
> org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.doSnapshot(RocksIncrementalSnapshotStrategy.java:170)
>       at 
> org.apache.flink.contrib.streaming.state.snapshot.RocksDBSnapshotStrategyBase.snapshot(RocksDBSnapshotStrategyBase.java:126)
>       at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.snapshot(RocksDBKeyedStateBackend.java:439)
>       at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:411)
>       ... 17 more
> {code}
> https://api.travis-ci.org/v3/job/580915660/log.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to