[
https://issues.apache.org/jira/browse/FLINK-19027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206112#comment-17206112
]
Arvid Heise commented on FLINK-19027:
-------------------------------------
I found the cause: the checkpoint is timing out for yet unknown reason after
redeployment causing a 10 min delay. The test has a timeout of 5 min though.
{noformat}
13521 [Checkpoint Timer] INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
checkpoint 12 (type=CHECKPOINT) @ 1601636334880 for job
0b6b6a79f7a331282e512f87292c48ea.
13524 [Map (1/1)] INFO org.apache.flink.runtime.taskmanager.Task [] -
Registering task at network: Map (1/1)
(0b6b6a79f7a331282e512f87292c48ea_0a448493b4782967b150582570326227_0_4)
[DEPLOYING].
13526 [flink-akka.actor.default-dispatcher-6] INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Received task Sink:
Unnamed (1/1)
(0b6b6a79f7a331282e512f87292c48ea_ea632d67b7d595e5b851708ae9ad79d6_0_4), deploy
into slot with allocation id c56bdbebf5db9193d896037c313d53ed.
13526 [Sink: Unnamed (1/1)] INFO org.apache.flink.runtime.taskmanager.Task []
- Sink: Unnamed (1/1)
(0b6b6a79f7a331282e512f87292c48ea_ea632d67b7d595e5b851708ae9ad79d6_0_4)
switched from CREATED to DEPLOYING.
13527 [Sink: Unnamed (1/1)] INFO org.apache.flink.runtime.taskmanager.Task []
- Loading JAR files for task Sink: Unnamed (1/1)
(0b6b6a79f7a331282e512f87292c48ea_ea632d67b7d595e5b851708ae9ad79d6_0_4)
[DEPLOYING].
13527 [Map (1/1)] INFO org.apache.flink.streaming.runtime.tasks.StreamTask []
- Using job/cluster config to configure application-defined state backend: File
State Backend (checkpoints:
'file:/var/folders/dm/pnwfg9352vsft8vp3n743mmc0000gn/T/junit6732308533903563238/junit620424462495595133',
savepoints: 'null', asynchronous: TRUE, fileStateThreshold: 20480)
13527 [Map (1/1)] INFO org.apache.flink.streaming.runtime.tasks.StreamTask []
- Using application-defined state backend: File State Backend (checkpoints:
'file:/var/folders/dm/pnwfg9352vsft8vp3n743mmc0000gn/T/junit6732308533903563238/junit620424462495595133',
savepoints: 'null', asynchronous: TRUE, fileStateThreshold: 20480)
13528 [Sink: Unnamed (1/1)] INFO org.apache.flink.runtime.taskmanager.Task []
- Registering task at network: Sink: Unnamed (1/1)
(0b6b6a79f7a331282e512f87292c48ea_ea632d67b7d595e5b851708ae9ad79d6_0_4)
[DEPLOYING].
13528 [Map (1/1)] INFO org.apache.flink.runtime.taskmanager.Task [] - Map
(1/1) (0b6b6a79f7a331282e512f87292c48ea_0a448493b4782967b150582570326227_0_4)
switched from DEPLOYING to RUNNING.
13529 [Source: source (1/1)] INFO
org.apache.flink.test.checkpointing.UnalignedCheckpointITCase [] - Snapshotted
next input 1757 @ 0 subtask (? attempt)
13529 [flink-akka.actor.default-dispatcher-3] INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Map (1/1)
(0b6b6a79f7a331282e512f87292c48ea_0a448493b4782967b150582570326227_0_4)
switched from DEPLOYING to RUNNING.
13530 [Sink: Unnamed (1/1)] INFO
org.apache.flink.streaming.runtime.tasks.StreamTask [] - Using job/cluster
config to configure application-defined state backend: File State Backend
(checkpoints:
'file:/var/folders/dm/pnwfg9352vsft8vp3n743mmc0000gn/T/junit6732308533903563238/junit620424462495595133',
savepoints: 'null', asynchronous: TRUE, fileStateThreshold: 20480)
13530 [Sink: Unnamed (1/1)] INFO
org.apache.flink.streaming.runtime.tasks.StreamTask [] - Using
application-defined state backend: File State Backend (checkpoints:
'file:/var/folders/dm/pnwfg9352vsft8vp3n743mmc0000gn/T/junit6732308533903563238/junit620424462495595133',
savepoints: 'null', asynchronous: TRUE, fileStateThreshold: 20480)
13531 [Sink: Unnamed (1/1)] INFO org.apache.flink.runtime.taskmanager.Task []
- Sink: Unnamed (1/1)
(0b6b6a79f7a331282e512f87292c48ea_ea632d67b7d595e5b851708ae9ad79d6_0_4)
switched from DEPLOYING to RUNNING.
13532 [flink-akka.actor.default-dispatcher-4] INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Unnamed (1/1)
(0b6b6a79f7a331282e512f87292c48ea_ea632d67b7d595e5b851708ae9ad79d6_0_4)
switched from DEPLOYING to RUNNING.
13542 [Sink: Unnamed (1/1)] INFO
org.apache.flink.test.checkpointing.UnalignedCheckpointITCase [] - Initialized
last snapshotted records [[130]] @ 0 subtask (4 attempt)
613532 [Checkpoint Timer] INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Checkpoint 12 of
job 0b6b6a79f7a331282e512f87292c48ea expired before completing.{noformat}
> UnalignedCheckpointITCase.shouldPerformUnalignedCheckpointOnParallelRemoteChannel
> failed because of test timeout
> ----------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-19027
> URL: https://issues.apache.org/jira/browse/FLINK-19027
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.12.0, 1.11.2
> Reporter: Dian Fu
> Assignee: Arvid Heise
> Priority: Major
> Labels: test-stability
> Fix For: 1.12.0, 1.11.3
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5789&view=logs&j=119bbba7-f5e3-5e08-e72d-09f1529665de&t=ec103906-d047-5b8a-680e-05fc000dfca9]
> {code}
> 2020-08-22T21:13:05.5315459Z [ERROR]
> shouldPerformUnalignedCheckpointOnParallelRemoteChannel(org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)
> Time elapsed: 300.075 s <<< ERROR!
> 2020-08-22T21:13:05.5316451Z org.junit.runners.model.TestTimedOutException:
> test timed out after 300 seconds
> 2020-08-22T21:13:05.5317432Z at sun.misc.Unsafe.park(Native Method)
> 2020-08-22T21:13:05.5317799Z at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 2020-08-22T21:13:05.5318247Z at
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> 2020-08-22T21:13:05.5318885Z at
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> 2020-08-22T21:13:05.5327035Z at
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> 2020-08-22T21:13:05.5328114Z at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> 2020-08-22T21:13:05.5328869Z at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1719)
> 2020-08-22T21:13:05.5329482Z at
> org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:74)
> 2020-08-22T21:13:05.5330138Z at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1699)
> 2020-08-22T21:13:05.5330771Z at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1681)
> 2020-08-22T21:13:05.5331351Z at
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase.execute(UnalignedCheckpointITCase.java:158)
> 2020-08-22T21:13:05.5332015Z at
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase.shouldPerformUnalignedCheckpointOnParallelRemoteChannel(UnalignedCheckpointITCase.java:140)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)