[ 
https://issues.apache.org/jira/browse/FLINK-22493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335336#comment-17335336
 ] 

Robert Metzger commented on FLINK-22493:
----------------------------------------

I believe the problem is the following here:

Once all tasks are running, the test triggers a savepoint, which intentionally 
fails, because of a test exception in a Task's checkpointing method. The test 
then waits for the savepoint future to fail, and the scheduler to restart the 
tasks. Once they are running again, it performs a sanity check whether the 
savepoint directory has been properly removed. In the reported run, there was 
still the savepoint directory around.

The savepoint directory is removed via the PendingCheckpoint.discard() method. 
This method is executed using the i/o executor pool of the 
CheckpointCoordinator. There is no guarantee that this discard method has been 
executed when the job is running again (and the executor shuts down with the 
dispatcher, hence it is not bound to job restarts).

I'll open a small PR to harden the test.

> AdaptiveSchedulerITCase.testStopWithSavepointFailOnFirstSavepointSucceedOnSecond
>  found unexpected files
> -------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-22493
>                 URL: https://issues.apache.org/jira/browse/FLINK-22493
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.14.0
>            Reporter: Dawid Wysakowicz
>            Assignee: Robert Metzger
>            Priority: Critical
>              Labels: test-stability
>         Attachments: 
> AdaptiveSchedulerITCase.testStopWithSavepointFailOnFirstSavepointSucceedOnSecond.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=17285&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=5360d54c-8d94-5d85-304e-a89267eb785a&l=9340
> {code}
> Apr 27 11:10:07 [INFO] Running 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase
> Apr 27 11:10:24 [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, 
> Time elapsed: 17.177 s <<< FAILURE! - in 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase
> Apr 27 11:10:24 [ERROR] 
> testStopWithSavepointFailOnFirstSavepointSucceedOnSecond(org.apache.flink.test.scheduling.AdaptiveSchedulerITCase)
>   Time elapsed: 0.305 s  <<< FAILURE!
> Apr 27 11:10:24 java.lang.AssertionError: Found unexpected files: 
> /tmp/junit3745203124457058148/savepoint/savepoint-8596b1-b3046c9bcf40
> Apr 27 11:10:24       at org.junit.Assert.fail(Assert.java:88)
> Apr 27 11:10:24       at 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase.testStopWithSavepointFailOnFirstSavepointSucceedOnSecond(AdaptiveSchedulerITCase.java:226)
> Apr 27 11:10:24       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Apr 27 11:10:24       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Apr 27 11:10:24       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Apr 27 11:10:24       at java.lang.reflect.Method.invoke(Method.java:498)
> Apr 27 11:10:24       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> Apr 27 11:10:24       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Apr 27 11:10:24       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> Apr 27 11:10:24       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Apr 27 11:10:24       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> Apr 27 11:10:24       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> Apr 27 11:10:24       at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> Apr 27 11:10:24       at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Apr 27 11:10:24       at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> Apr 27 11:10:24       at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Apr 27 11:10:24       at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> Apr 27 11:10:24       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> Apr 27 11:10:24       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> Apr 27 11:10:24       at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> Apr 27 11:10:24       at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> Apr 27 11:10:24       at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> Apr 27 11:10:24       at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> Apr 27 11:10:24       at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> Apr 27 11:10:24       at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> Apr 27 11:10:24       at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Apr 27 11:10:24       at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> Apr 27 11:10:24       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> Apr 27 11:10:24       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> Apr 27 11:10:24       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> Apr 27 11:10:24       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> Apr 27 11:10:24       at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> Apr 27 11:10:24       at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> Apr 27 11:10:24       at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> Apr 27 11:10:24       at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> Apr 27 11:10:24 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to