[ 
https://issues.apache.org/jira/browse/FLINK-23201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Qin updated FLINK-23201:
----------------------------
    Description: 
The check on alignmentDurationNanos seems to be too strict at the line:
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointMetrics.java#L74
See the following track trace:

{code:java}
...
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: 
Asynchronous task checkpoint failed. 
at 
org.apache.flink.runtime.messages.checkpoint.SerializedCheckpointException.unwrap(SerializedCheckpointException.java:51)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveDeclineMessage(CheckpointCoordinator.java:975)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
at 
org.apache.flink.runtime.scheduler.SchedulerBase.lambda$declineCheckpoint$8(SchedulerBase.java:1076)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_282] 
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_282] 
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 ~[?:1.8.0_282] 
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 ~[?:1.8.0_282] 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_282] 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_282] 
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_282] 
Caused by: org.apache.flink.util.SerializedThrowable: Asynchronous task 
checkpoint failed. 
at 
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:261)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
at 
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:174)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_282] 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_282] 
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_282] 
Caused by: org.apache.flink.util.SerializedThrowable: Could not materialize 
checkpoint 151 for operator equipment internal with metadata -> (Sink: not 
enriched kafka sink, map back to EquipmentInternal -> Sink: 
ForEquipmentinternal) (1/1)#0. 
at 
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:239)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
at 
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:174)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_282] 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_282] 
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_282] 
Caused by: org.apache.flink.util.SerializedThrowable 
at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:122) 
~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
at 
org.apache.flink.runtime.checkpoint.CheckpointMetrics.<init>(CheckpointMetrics.java:63)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
at 
org.apache.flink.runtime.checkpoint.CheckpointMetricsBuilder.build(CheckpointMetricsBuilder.java:123)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
at 
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.reportCompletedSnapshotStates(AsyncCheckpointRunnable.java:202)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
at 
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:157)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_282] 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_282] 
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_282]
{code}

This caused a job fail when doing stop-with-savepoint. But doing savepoint only 
without stop does not seem to be impacted by this.

  was:
The check on alignmentDurationNanos seems to be too strict at the line:

https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointMetrics.java#L74

This may cause a job fail when doing stop-with-savepoint.


> The check on alignmentDurationNanos seems to be too strict
> ----------------------------------------------------------
>
>                 Key: FLINK-23201
>                 URL: https://issues.apache.org/jira/browse/FLINK-23201
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Metrics
>    Affects Versions: 1.12.2
>            Reporter: Jun Qin
>            Priority: Major
>
> The check on alignmentDurationNanos seems to be too strict at the line:
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointMetrics.java#L74
> See the following track trace:
> {code:java}
> ...
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: 
> Asynchronous task checkpoint failed. 
> at 
> org.apache.flink.runtime.messages.checkpoint.SerializedCheckpointException.unwrap(SerializedCheckpointException.java:51)
>  ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
> at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveDeclineMessage(CheckpointCoordinator.java:975)
>  ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
> at 
> org.apache.flink.runtime.scheduler.SchedulerBase.lambda$declineCheckpoint$8(SchedulerBase.java:1076)
>  ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_282] 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_282] 
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  ~[?:1.8.0_282] 
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  ~[?:1.8.0_282] 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_282] 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_282] 
> at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_282] 
> Caused by: org.apache.flink.util.SerializedThrowable: Asynchronous task 
> checkpoint failed. 
> at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:261)
>  ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
> at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:174)
>  ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_282] 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_282] 
> at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_282] 
> Caused by: org.apache.flink.util.SerializedThrowable: Could not materialize 
> checkpoint 151 for operator equipment internal with metadata -> (Sink: not 
> enriched kafka sink, map back to EquipmentInternal -> Sink: 
> ForEquipmentinternal) (1/1)#0. 
> at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:239)
>  ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
> at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:174)
>  ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_282] 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_282] 
> at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_282] 
> Caused by: org.apache.flink.util.SerializedThrowable 
> at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:122) 
> ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
> at 
> org.apache.flink.runtime.checkpoint.CheckpointMetrics.<init>(CheckpointMetrics.java:63)
>  ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
> at 
> org.apache.flink.runtime.checkpoint.CheckpointMetricsBuilder.build(CheckpointMetricsBuilder.java:123)
>  ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
> at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.reportCompletedSnapshotStates(AsyncCheckpointRunnable.java:202)
>  ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
> at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:157)
>  ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_282] 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_282] 
> at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_282]
> {code}
> This caused a job fail when doing stop-with-savepoint. But doing savepoint 
> only without stop does not seem to be impacted by this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to