[
https://issues.apache.org/jira/browse/FLINK-23201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jun Qin updated FLINK-23201:
----------------------------
Description:
The check on alignmentDurationNanos seems to be too strict at the line:
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointMetrics.java#L74
See the following track trace:
{code:java}
{code}
This caused a job fail when doing stop-with-savepoint. But doing savepoint only
without stop does not seem to be impacted by this.
was:
The check on alignmentDurationNanos seems to be too strict at the line:
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointMetrics.java#L74
See the following track trace:
{code:java}
...
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException:
Asynchronous task checkpoint failed.
at
org.apache.flink.runtime.messages.checkpoint.SerializedCheckpointException.unwrap(SerializedCheckpointException.java:51)
~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1]
at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveDeclineMessage(CheckpointCoordinator.java:975)
~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1]
at
org.apache.flink.runtime.scheduler.SchedulerBase.lambda$declineCheckpoint$8(SchedulerBase.java:1076)
~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[?:1.8.0_282]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_282]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
~[?:1.8.0_282]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
~[?:1.8.0_282]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_282]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
~[?:1.8.0_282]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_282]
Caused by: org.apache.flink.util.SerializedThrowable: Asynchronous task
checkpoint failed.
at
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:261)
~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1]
at
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:174)
~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_282]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
~[?:1.8.0_282]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_282]
Caused by: org.apache.flink.util.SerializedThrowable: Could not materialize
checkpoint 151 for operator equipment internal with metadata -> (Sink: not
enriched kafka sink, map back to EquipmentInternal -> Sink:
ForEquipmentinternal) (1/1)#0.
at
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:239)
~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1]
at
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:174)
~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_282]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
~[?:1.8.0_282]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_282]
Caused by: org.apache.flink.util.SerializedThrowable
at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:122)
~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1]
at
org.apache.flink.runtime.checkpoint.CheckpointMetrics.<init>(CheckpointMetrics.java:63)
~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1]
at
org.apache.flink.runtime.checkpoint.CheckpointMetricsBuilder.build(CheckpointMetricsBuilder.java:123)
~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1]
at
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.reportCompletedSnapshotStates(AsyncCheckpointRunnable.java:202)
~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1]
at
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:157)
~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_282]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
~[?:1.8.0_282]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_282]
{code}
This caused a job fail when doing stop-with-savepoint. But doing savepoint only
without stop does not seem to be impacted by this.
> The check on alignmentDurationNanos seems to be too strict
> ----------------------------------------------------------
>
> Key: FLINK-23201
> URL: https://issues.apache.org/jira/browse/FLINK-23201
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Metrics
> Affects Versions: 1.12.2
> Reporter: Jun Qin
> Priority: Major
>
> The check on alignmentDurationNanos seems to be too strict at the line:
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointMetrics.java#L74
> See the following track trace:
> {code:java}
> {code}
> This caused a job fail when doing stop-with-savepoint. But doing savepoint
> only without stop does not seem to be impacted by this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)