[ 
https://issues.apache.org/jira/browse/FLINK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868720#comment-17868720
 ] 

Matthias Pohl commented on FLINK-35672:
---------------------------------------

The actual error seems to be the following stacktrace:
{code}
03:50:41,377 [jobmanager-io-thread-2] WARN  
org.apache.flink.runtime.jobmaster.JobMaster                 [] - Error while 
processing AcknowledgeCheckpoint message
java.lang.IllegalStateException: Attempt to reference unknown state: 
49b8942d-face-3496-8ade-18195ab5748b
        at 
org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
~[flink-core-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
        at 
org.apache.flink.runtime.state.SharedStateRegistryImpl.registerReference(SharedStateRegistryImpl.java:97)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
        at 
org.apache.flink.runtime.state.SharedStateRegistry.registerReference(SharedStateRegistry.java:53)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
        at 
org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandle.registerSharedStates(IncrementalRemoteKeyedStateHandle.java:289)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
        at 
org.apache.flink.runtime.checkpoint.OperatorSubtaskState.registerSharedState(OperatorSubtaskState.java:243)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
        at 
org.apache.flink.runtime.checkpoint.OperatorSubtaskState.registerSharedStates(OperatorSubtaskState.java:226)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
        at 
org.apache.flink.runtime.checkpoint.TaskStateSnapshot.registerSharedStates(TaskStateSnapshot.java:193)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
        at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1245)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
        at 
org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$acknowledgeCheckpoint$2(ExecutionGraphHandler.java:109)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
        at 
org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$4(ExecutionGraphHandler.java:139)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
        at 
org.apache.flink.util.MdcUtils.lambda$wrapRunnable$1(MdcUtils.java:64) 
~[flink-core-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_292]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_292]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
{code}
I attached an extract from the logs that were referenced in this Jira issue's 
description.

> testPreAggregatedSlidingTimeWindow failed due to due to checkpoint expired 
> before completing
> --------------------------------------------------------------------------------------------
>
>                 Key: FLINK-35672
>                 URL: https://issues.apache.org/jira/browse/FLINK-35672
>             Project: Flink
>          Issue Type: Bug
>          Components: Build System / CI
>    Affects Versions: 1.20.0
>            Reporter: Weijie Guo
>            Priority: Major
>
> {code:java}
> Caused by: org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint 
> tolerable failure threshold. The latest checkpoint failed due to Checkpoint 
> expired before completing., view the Checkpoint History tab or the Job 
> Manager log to find out why continuous checkpoints failed.
>       at 
> org.apache.flink.runtime.checkpoint.CheckpointFailureManager.checkFailureAgainstCounter(CheckpointFailureManager.java:212)
>       at 
> org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleJobLevelCheckpointException(CheckpointFailureManager.java:169)
>       at 
> org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleCheckpointException(CheckpointFailureManager.java:122)
>       at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2281)
>       at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2260)
>       at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$1200(CheckpointCoordinator.java:102)
>       at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java:2346)
>       at 
> org.apache.flink.util.MdcUtils.lambda$wrapRunnable$1(MdcUtils.java:64)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=60404&view=logs&j=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3&t=0c010d0c-3dec-5bf1-d408-7b18988b1b2b&l=8785



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to