Robert Metzger created FLINK-5462:
-------------------------------------
Summary: Flink job fails due to
java.util.concurrent.CancellationException while snapshotting
Key: FLINK-5462
URL: https://issues.apache.org/jira/browse/FLINK-5462
Project: Flink
Issue Type: Bug
Components: State Backends, Checkpointing
Affects Versions: 1.2.0
Reporter: Robert Metzger
I'm using Flink 699f4b0.
My restored, rescaled Flink job failed while creating a checkpoint with the
following exception:
{code}
2017-01-11 18:46:49,853 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering
checkpoint 3 @ 1484160409846
2017-01-11 18:49:50,111 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph -
TriggerWindow(TumblingEventTimeWindows(4),
ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071},
EventTimeTrigger(), WindowedStream
.apply(AllWindowedStream.java:440)) (1/1) (2accc6ca2727c4f7ec963318fbd237e9)
switched from RUNNING to FAILED.
AsynchronousException{java.lang.Exception: Could not materialize checkpoint 3
for operator TriggerWindow(TumblingEventTimeWindows(4),
ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071},
EventTimeTrigger(), WindowedStream.ap
ply(AllWindowedStream.java:440)) (1/1).}
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Exception: Could not materialize checkpoint 3 for operator
TriggerWindow(TumblingEventTimeWindows(4),
ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071},
EventTimeTrigger(), WindowedStream.apply(AllWind
owedStream.java:440)) (1/1).
... 6 more
Caused by: java.util.concurrent.CancellationException
at java.util.concurrent.FutureTask.report(FutureTask.java:121)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at
org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:899)
... 5 more
2017-01-11 18:49:50,113 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Generate
Event Window stream (90859d392c1da472e07695f434b332ef) switched from state
RUNNING to FAILING.
AsynchronousException{java.lang.Exception: Could not materialize checkpoint 3
for operator TriggerWindow(TumblingEventTimeWindows(4),
ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071},
EventTimeTrigger(), WindowedStream.ap
ply(AllWindowedStream.java:440)) (1/1).}
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Exception: Could not materialize checkpoint 3 for operator
TriggerWindow(TumblingEventTimeWindows(4),
ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071},
EventTimeTrigger(), WindowedStream.apply(AllWindowedStream.java:440)) (1/1).
... 6 more
Caused by: java.util.concurrent.CancellationException
at java.util.concurrent.FutureTask.report(FutureTask.java:121)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at
org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:899)
... 5 more
2017-01-11 18:49:50,122 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom
Source -> Timestamps/Watermarks (1/2) (e52c1211b5693552f5908b0082c80882)
switched from RUNNING to CANCELING.
{code}
There are no other logged around that time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)