[
https://issues.apache.org/jira/browse/FLINK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868720#comment-17868720
]
Matthias Pohl commented on FLINK-35672:
---------------------------------------
The actual error seems to be the following stacktrace:
{code}
03:50:41,377 [jobmanager-io-thread-2] WARN
org.apache.flink.runtime.jobmaster.JobMaster [] - Error while
processing AcknowledgeCheckpoint message
java.lang.IllegalStateException: Attempt to reference unknown state:
49b8942d-face-3496-8ade-18195ab5748b
at
org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)
~[flink-core-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at
org.apache.flink.runtime.state.SharedStateRegistryImpl.registerReference(SharedStateRegistryImpl.java:97)
~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at
org.apache.flink.runtime.state.SharedStateRegistry.registerReference(SharedStateRegistry.java:53)
~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at
org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandle.registerSharedStates(IncrementalRemoteKeyedStateHandle.java:289)
~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at
org.apache.flink.runtime.checkpoint.OperatorSubtaskState.registerSharedState(OperatorSubtaskState.java:243)
~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at
org.apache.flink.runtime.checkpoint.OperatorSubtaskState.registerSharedStates(OperatorSubtaskState.java:226)
~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at
org.apache.flink.runtime.checkpoint.TaskStateSnapshot.registerSharedStates(TaskStateSnapshot.java:193)
~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1245)
~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at
org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$acknowledgeCheckpoint$2(ExecutionGraphHandler.java:109)
~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at
org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$4(ExecutionGraphHandler.java:139)
~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at
org.apache.flink.util.MdcUtils.lambda$wrapRunnable$1(MdcUtils.java:64)
~[flink-core-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_292]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_292]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
{code}
I attached an extract from the logs that were referenced in this Jira issue's
description.
> testPreAggregatedSlidingTimeWindow failed due to due to checkpoint expired
> before completing
> --------------------------------------------------------------------------------------------
>
> Key: FLINK-35672
> URL: https://issues.apache.org/jira/browse/FLINK-35672
> Project: Flink
> Issue Type: Bug
> Components: Build System / CI
> Affects Versions: 1.20.0
> Reporter: Weijie Guo
> Priority: Major
>
> {code:java}
> Caused by: org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint
> tolerable failure threshold. The latest checkpoint failed due to Checkpoint
> expired before completing., view the Checkpoint History tab or the Job
> Manager log to find out why continuous checkpoints failed.
> at
> org.apache.flink.runtime.checkpoint.CheckpointFailureManager.checkFailureAgainstCounter(CheckpointFailureManager.java:212)
> at
> org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleJobLevelCheckpointException(CheckpointFailureManager.java:169)
> at
> org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleCheckpointException(CheckpointFailureManager.java:122)
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2281)
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2260)
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$1200(CheckpointCoordinator.java:102)
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java:2346)
> at
> org.apache.flink.util.MdcUtils.lambda$wrapRunnable$1(MdcUtils.java:64)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=60404&view=logs&j=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3&t=0c010d0c-3dec-5bf1-d408-7b18988b1b2b&l=8785
--
This message was sent by Atlassian Jira
(v8.20.10#820010)