[
https://issues.apache.org/jira/browse/FLINK-38223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18041245#comment-18041245
]
Rui Fan commented on FLINK-38223:
---------------------------------
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=71115&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=1ffc5ec2-7913-50ff-0177-3fca16f1b8f0&l=72092
> ExecutionGraphRestartTest and ExecutionGraphCoLocationRestartTest are flaky
> on master
> -------------------------------------------------------------------------------------
>
> Key: FLINK-38223
> URL: https://issues.apache.org/jira/browse/FLINK-38223
> Project: Flink
> Issue Type: Bug
> Components: Tests
> Affects Versions: 2.1.0
> Reporter: Gustavo de Morais
> Assignee: Fabian Paul
> Priority: Major
> Fix For: 2.3.0
>
>
> Both these suites are really flaky on master. Tests like
> testConstraintsAfterRestart and testCancelWhileFailing are constantly failing
> CI pipelines with errors like.
> You can reproduce it locally by running the suite locally.
> {code:java}
> Aug 11 00:04:37 00:04:37.047 [ERROR] Errors:
> Aug 11 00:04:37 00:04:37.047 [ERROR]
> ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart:113 » Timeout
> Not all executions fulfilled the predicate in time. {code}
> {code:java}
> org.opentest4j.AssertionFailedError: expected: RUNNING but was:
> FAILINGExpected :RUNNINGActual :FAILING<Click to see difference>
> at
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testCancelWhileFailing(ExecutionGraphRestartTest.java:217)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568) at
> java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
> at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)
> at
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
> at
> java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
> at
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
> at
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
> Suppressed: java.lang.IllegalStateException: Free slot must not be
> used. at
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)
> at
> org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:564)
> at
> org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.freeAndReleaseSlots(DefaultDeclarativeSlotPool.java:507)
> at
> org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:477)
> at
> org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.internalReleaseTaskManager(DeclarativeSlotPoolService.java:281)
> at
> org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.releaseAllTaskManagers(DeclarativeSlotPoolService.java:271)
> at
> org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.close(DeclarativeSlotPoolService.java:160)
> at
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testCancelWhileFailing(ExecutionGraphRestartTest.java:200)
> ... 7 more
> {code}
> {code:java}
> java.util.concurrent.TimeoutException: Not all executions fulfilled the
> predicate in time.
> at
> org.apache.flink.runtime.executiongraph.ExecutionGraphTestUtils.waitForAllExecutionsPredicate(ExecutionGraphTestUtils.java:203)
> at
> org.apache.flink.runtime.executiongraph.ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart(ExecutionGraphCoLocationRestartTest.java:113)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568) at
> java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
> at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)
> at
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
> at
> java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
> at
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
> at
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
> {code}
> CI Link example
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=69283&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=1ffc5ec2-7913-50ff-0177-3fca16f1b8f0]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)