[ 
https://issues.apache.org/jira/browse/FLINK-19027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206112#comment-17206112
 ] 

Arvid Heise commented on FLINK-19027:
-------------------------------------

I found the cause: the checkpoint is timing out for yet unknown reason after 
redeployment causing a 10 min delay. The test has a timeout of 5 min though.

 
{noformat}
13521 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering 
checkpoint 12 (type=CHECKPOINT) @ 1601636334880 for job 
0b6b6a79f7a331282e512f87292c48ea.
13524 [Map (1/1)] INFO  org.apache.flink.runtime.taskmanager.Task [] - 
Registering task at network: Map (1/1) 
(0b6b6a79f7a331282e512f87292c48ea_0a448493b4782967b150582570326227_0_4) 
[DEPLOYING].
13526 [flink-akka.actor.default-dispatcher-6] INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Received task Sink: 
Unnamed (1/1) 
(0b6b6a79f7a331282e512f87292c48ea_ea632d67b7d595e5b851708ae9ad79d6_0_4), deploy 
into slot with allocation id c56bdbebf5db9193d896037c313d53ed.
13526 [Sink: Unnamed (1/1)] INFO  org.apache.flink.runtime.taskmanager.Task [] 
- Sink: Unnamed (1/1) 
(0b6b6a79f7a331282e512f87292c48ea_ea632d67b7d595e5b851708ae9ad79d6_0_4) 
switched from CREATED to DEPLOYING.
13527 [Sink: Unnamed (1/1)] INFO  org.apache.flink.runtime.taskmanager.Task [] 
- Loading JAR files for task Sink: Unnamed (1/1) 
(0b6b6a79f7a331282e512f87292c48ea_ea632d67b7d595e5b851708ae9ad79d6_0_4) 
[DEPLOYING].
13527 [Map (1/1)] INFO  org.apache.flink.streaming.runtime.tasks.StreamTask [] 
- Using job/cluster config to configure application-defined state backend: File 
State Backend (checkpoints: 
'file:/var/folders/dm/pnwfg9352vsft8vp3n743mmc0000gn/T/junit6732308533903563238/junit620424462495595133',
 savepoints: 'null', asynchronous: TRUE, fileStateThreshold: 20480)
13527 [Map (1/1)] INFO  org.apache.flink.streaming.runtime.tasks.StreamTask [] 
- Using application-defined state backend: File State Backend (checkpoints: 
'file:/var/folders/dm/pnwfg9352vsft8vp3n743mmc0000gn/T/junit6732308533903563238/junit620424462495595133',
 savepoints: 'null', asynchronous: TRUE, fileStateThreshold: 20480)
13528 [Sink: Unnamed (1/1)] INFO  org.apache.flink.runtime.taskmanager.Task [] 
- Registering task at network: Sink: Unnamed (1/1) 
(0b6b6a79f7a331282e512f87292c48ea_ea632d67b7d595e5b851708ae9ad79d6_0_4) 
[DEPLOYING].
13528 [Map (1/1)] INFO  org.apache.flink.runtime.taskmanager.Task [] - Map 
(1/1) (0b6b6a79f7a331282e512f87292c48ea_0a448493b4782967b150582570326227_0_4) 
switched from DEPLOYING to RUNNING.
13529 [Source: source (1/1)] INFO  
org.apache.flink.test.checkpointing.UnalignedCheckpointITCase [] - Snapshotted 
next input 1757 @ 0 subtask (? attempt)
13529 [flink-akka.actor.default-dispatcher-3] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Map (1/1) 
(0b6b6a79f7a331282e512f87292c48ea_0a448493b4782967b150582570326227_0_4) 
switched from DEPLOYING to RUNNING.
13530 [Sink: Unnamed (1/1)] INFO  
org.apache.flink.streaming.runtime.tasks.StreamTask [] - Using job/cluster 
config to configure application-defined state backend: File State Backend 
(checkpoints: 
'file:/var/folders/dm/pnwfg9352vsft8vp3n743mmc0000gn/T/junit6732308533903563238/junit620424462495595133',
 savepoints: 'null', asynchronous: TRUE, fileStateThreshold: 20480)
13530 [Sink: Unnamed (1/1)] INFO  
org.apache.flink.streaming.runtime.tasks.StreamTask [] - Using 
application-defined state backend: File State Backend (checkpoints: 
'file:/var/folders/dm/pnwfg9352vsft8vp3n743mmc0000gn/T/junit6732308533903563238/junit620424462495595133',
 savepoints: 'null', asynchronous: TRUE, fileStateThreshold: 20480)
13531 [Sink: Unnamed (1/1)] INFO  org.apache.flink.runtime.taskmanager.Task [] 
- Sink: Unnamed (1/1) 
(0b6b6a79f7a331282e512f87292c48ea_ea632d67b7d595e5b851708ae9ad79d6_0_4) 
switched from DEPLOYING to RUNNING.
13532 [flink-akka.actor.default-dispatcher-4] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Unnamed (1/1) 
(0b6b6a79f7a331282e512f87292c48ea_ea632d67b7d595e5b851708ae9ad79d6_0_4) 
switched from DEPLOYING to RUNNING.
13542 [Sink: Unnamed (1/1)] INFO  
org.apache.flink.test.checkpointing.UnalignedCheckpointITCase [] - Initialized 
last snapshotted records [[130]] @ 0 subtask (4 attempt)
613532 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Checkpoint 12 of 
job 0b6b6a79f7a331282e512f87292c48ea expired before completing.{noformat}

> UnalignedCheckpointITCase.shouldPerformUnalignedCheckpointOnParallelRemoteChannel
>  failed because of test timeout
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-19027
>                 URL: https://issues.apache.org/jira/browse/FLINK-19027
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.12.0, 1.11.2
>            Reporter: Dian Fu
>            Assignee: Arvid Heise
>            Priority: Major
>              Labels: test-stability
>             Fix For: 1.12.0, 1.11.3
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5789&view=logs&j=119bbba7-f5e3-5e08-e72d-09f1529665de&t=ec103906-d047-5b8a-680e-05fc000dfca9]
> {code}
> 2020-08-22T21:13:05.5315459Z [ERROR] 
> shouldPerformUnalignedCheckpointOnParallelRemoteChannel(org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)
>   Time elapsed: 300.075 s  <<< ERROR!
> 2020-08-22T21:13:05.5316451Z org.junit.runners.model.TestTimedOutException: 
> test timed out after 300 seconds
> 2020-08-22T21:13:05.5317432Z  at sun.misc.Unsafe.park(Native Method)
> 2020-08-22T21:13:05.5317799Z  at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 2020-08-22T21:13:05.5318247Z  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> 2020-08-22T21:13:05.5318885Z  at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> 2020-08-22T21:13:05.5327035Z  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> 2020-08-22T21:13:05.5328114Z  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> 2020-08-22T21:13:05.5328869Z  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1719)
> 2020-08-22T21:13:05.5329482Z  at 
> org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:74)
> 2020-08-22T21:13:05.5330138Z  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1699)
> 2020-08-22T21:13:05.5330771Z  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1681)
> 2020-08-22T21:13:05.5331351Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase.execute(UnalignedCheckpointITCase.java:158)
> 2020-08-22T21:13:05.5332015Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase.shouldPerformUnalignedCheckpointOnParallelRemoteChannel(UnalignedCheckpointITCase.java:140)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to