[ 
https://issues.apache.org/jira/browse/FLINK-22367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325664#comment-17325664
 ] 

Chesnay Schepler commented on FLINK-22367:
------------------------------------------

hmm... it seems like a task isn't shutting down after a stopWithSavepoint 
operation. The savepoint operation itself seems to work fine, but only 1 
subtask shuts down afterwards. After 10 seconds the test is timing out:
{code:java}
22:28:32,329 [  testVertex (2/2)#0] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - testVertex 
(2/2)#0 (043f6b5353f19eee95021c9988dc2907) switched from DEPLOYING to 
INITIALIZING.
22:28:32,330 [  testVertex (2/2)#0] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - testVertex 
(2/2)#0 (043f6b5353f19eee95021c9988dc2907) switched from INITIALIZING to 
RUNNING.
22:28:32,331 [flink-akka.actor.default-dispatcher-3] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - testVertex 
(2/2) (043f6b5353f19eee95021c9988dc2907) switched from DEPLOYING to 
INITIALIZING.
22:28:32,331 [flink-akka.actor.default-dispatcher-3] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - testVertex 
(2/2) (043f6b5353f19eee95021c9988dc2907) switched from INITIALIZING to RUNNING.
22:28:32,343 [    Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering 
checkpoint 1 (type=CHECKPOINT) @ 1618871312339 for job 
15f6c6d1527f3cbd07771ec998fbb38f.
22:28:32,344 [flink-akka.actor.default-dispatcher-3] INFO  
org.apache.flink.runtime.jobmaster.JobMaster                 [] - Triggering 
stop-with-savepoint for job 15f6c6d1527f3cbd07771ec998fbb38f.
22:28:32,351 [jobmanager-future-thread-10] WARN  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Received late 
message for now expired checkpoint attempt 1 from task 
0febd0493e4d0151092e5d9f1d11f8d7 of job 15f6c6d1527f3cbd07771ec998fbb38f at 
0acad546-425d-4c54-81c6-318541f6fc12 @ localhost (dataPort=-1).
22:28:32,352 [jobmanager-future-thread-11] WARN  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Received late 
message for now expired checkpoint attempt 1 from task 
043f6b5353f19eee95021c9988dc2907 of job 15f6c6d1527f3cbd07771ec998fbb38f at 
0acad546-425d-4c54-81c6-318541f6fc12 @ localhost (dataPort=-1).
22:28:32,358 [    Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering 
checkpoint 2 (type=SAVEPOINT_TERMINATE) @ 1618871312356 for job 
15f6c6d1527f3cbd07771ec998fbb38f.
22:28:32,405 [jobmanager-future-thread-7] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed 
checkpoint 2 for job 15f6c6d1527f3cbd07771ec998fbb38f (0 bytes in 48 ms).
22:28:32,413 [  testVertex (1/2)#0] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - testVertex 
(1/2)#0 (0febd0493e4d0151092e5d9f1d11f8d7) switched from RUNNING to FINISHED.
22:28:32,413 [  testVertex (1/2)#0] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Freeing task 
resources for testVertex (1/2)#0 (0febd0493e4d0151092e5d9f1d11f8d7).
22:28:32,414 [flink-akka.actor.default-dispatcher-3] INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - 
Un-registering task and sending final execution state FINISHED to JobManager 
for task testVertex (1/2)#0 0febd0493e4d0151092e5d9f1d11f8d7.
22:28:32,423 [flink-akka.actor.default-dispatcher-2] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - testVertex 
(1/2) (0febd0493e4d0151092e5d9f1d11f8d7) switched from RUNNING to 
FINISHED.22:28:42,376 [flink-akka.actor.default-dispatcher-2] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job Unnamed 
job (15f6c6d1527f3cbd07771ec998fbb38f) switched from state RUNNING to 
CANCELLING.
22:28:42,377 [flink-akka.actor.default-dispatcher-2] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding 
the results produced by task execution 0febd0493e4d0151092e5d9f1d11f8d7.
22:28:42,378 [flink-akka.actor.default-dispatcher-2] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - testVertex 
(2/2) (043f6b5353f19eee95021c9988dc2907) switched from RUNNING to CANCELING.
22:28:42,382 [flink-akka.actor.default-dispatcher-6] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Attempting to 
cancel task testVertex (2/2)#0 (043f6b5353f19eee95021c9988dc2907).
22:28:42,383 [flink-akka.actor.default-dispatcher-6] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - testVertex 
(2/2)#0 (043f6b5353f19eee95021c9988dc2907) switched from RUNNING to CANCELING.
22:28:42,383 [flink-akka.actor.default-dispatcher-6] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Triggering 
cancellation of task code testVertex (2/2)#0 (043f6b5353f19eee95021c9988dc2907).
22:28:42,385 [  testVertex (2/2)#0] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - testVertex 
(2/2)#0 (043f6b5353f19eee95021c9988dc2907) switched from CANCELING to CANCELED. 
{code}

> JobMasterStopWithSavepointITCase.terminateWithSavepointWithoutComplicationsShouldSucceedAndLeadJobToFinished
>  times out
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-22367
>                 URL: https://issues.apache.org/jira/browse/FLINK-22367
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.13.0
>            Reporter: Dawid Wysakowicz
>            Assignee: Chesnay Schepler
>            Priority: Critical
>              Labels: test-stability
>             Fix For: 1.13.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=16818&view=logs&j=2c3cbe13-dee0-5837-cf47-3053da9a8a78&t=2c7d57b9-7341-5a87-c9af-2cf7cc1a37dc&l=3844
> {code}
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 13.135 s <<< FAILURE! - in 
> org.apache.flink.runtime.jobmaster.JobMasterStopWithSavepointITCase
> Apr 19 22:28:44 [ERROR] 
> terminateWithSavepointWithoutComplicationsShouldSucceedAndLeadJobToFinished(org.apache.flink.runtime.jobmaster.JobMasterStopWithSavepointITCase)
>   Time elapsed: 10.237 s  <<< ERROR!
> Apr 19 22:28:44 java.util.concurrent.ExecutionException: 
> java.util.concurrent.TimeoutException: Invocation of public default 
> java.util.concurrent.CompletableFuture 
> org.apache.flink.runtime.webmonitor.RestfulGateway.stopWithSavepoint(org.apache.flink.api.common.JobID,java.lang.String,boolean,org.apache.flink.api.common.time.Time)
>  timed out.
> Apr 19 22:28:44       at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> Apr 19 22:28:44       at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> Apr 19 22:28:44       at 
> org.apache.flink.runtime.jobmaster.JobMasterStopWithSavepointITCase.stopWithSavepointNormalExecutionHelper(JobMasterStopWithSavepointITCase.java:123)
> Apr 19 22:28:44       at 
> org.apache.flink.runtime.jobmaster.JobMasterStopWithSavepointITCase.terminateWithSavepointWithoutComplicationsShouldSucceedAndLeadJobToFinished(JobMasterStopWithSavepointITCase.java:111)
> Apr 19 22:28:44       at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> Apr 19 22:28:44       at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Apr 19 22:28:44       at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Apr 19 22:28:44       at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> Apr 19 22:28:44       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> Apr 19 22:28:44       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Apr 19 22:28:44       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> Apr 19 22:28:44       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Apr 19 22:28:44       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> Apr 19 22:28:44       at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> Apr 19 22:28:44       at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Apr 19 22:28:44       at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> Apr 19 22:28:44       at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Apr 19 22:28:44       at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> Apr 19 22:28:44       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> Apr 19 22:28:44       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> Apr 19 22:28:44       at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> Apr 19 22:28:44       at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> Apr 19 22:28:44       at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> Apr 19 22:28:44       at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> Apr 19 22:28:44       at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> Apr 19 22:28:44       at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> Apr 19 22:28:44       at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> Apr 19 22:28:44       at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Apr 19 22:28:44       at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> Apr 19 22:28:44       at org.junit.runners.Suite.runChild(Suite.java:128)
> Apr 19 22:28:44       at org.junit.runners.Suite.runChild(Suite.java:27)
> Apr 19 22:28:44       at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> Apr 19 22:28:44       at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> Apr 19 22:28:44       at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> Apr 19 22:28:44       at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> Apr 19 22:28:44       at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> Apr 19 22:28:44       at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> Apr 19 22:28:44       at 
> org.apache.maven.surefire.junitcore.JUnitCore.run(JUnitCore.java:55)
> Apr 19 22:28:44       at 
> org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:137)
> Apr 19 22:28:44       at 
> org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeEager(JUnitCoreWrapper.java:107)
> Apr 19 22:28:44       at 
> org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:83)
> Apr 19 22:28:44       at 
> org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:75)
> Apr 19 22:28:44       at 
> org.apache.maven.surefire.junitcore.JUnitCoreProvider.invoke(JUnitCoreProvider.java:158)
> Apr 19 22:28:44       at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> Apr 19 22:28:44       at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> Apr 19 22:28:44       at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> Apr 19 22:28:44       at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to