[
https://issues.apache.org/jira/browse/FLINK-30141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735782#comment-17735782
]
Ryan Skraba commented on FLINK-30141:
-------------------------------------
I took a look at this and the related issue FLINK-26402 -- it looks like the
*503 Service Unavailable* statuses are not rare: they occur about 1 in a couple
hundred API calls to Minio on container startup. On the other hand, the retry
mechanism built into Amazon API clients _usually_ try again correctly until
they succeed. Sometimes, the Minio container doesn't move to the correct state
to service API calls quickly enough, the default retry strategy fails
eventually and we see the error here.
I can reproduce this pretty reliably by running a unit test somewhere between
1K-10K times. At first I assumed it occurred when the system was loaded while
running the test, but that doesn't appear to be the case.
Attempting to start up the container more than once might be the right thing to
do here. If the call to Minio fails while creating the default bucket, the
container should be discarded and tried again. This should have no overhead on
the daily CI runs.
> MinioTestContainerTest failed due to IllegalStateException in container
> startup
> -------------------------------------------------------------------------------
>
> Key: FLINK-30141
> URL: https://issues.apache.org/jira/browse/FLINK-30141
> Project: Flink
> Issue Type: Bug
> Components: Connectors / FileSystem, Tests
> Affects Versions: 1.17.0, 1.18.0
> Reporter: Matthias Pohl
> Priority: Major
> Labels: pull-request-available, test-stability
>
> [This
> build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=43182&view=logs&j=a1ac4ce4-9a4f-5fdb-3290-7e163fba19dc&t=3a8f44aa-4415-5b14-37d5-5fecc568b139&l=15531]
> failed due to an {{IllegalStateException}} during container startup:
> {code:java}
> Nov 15 02:34:04 [ERROR]
> org.apache.flink.fs.s3.common.MinioTestContainerTest.testBucketCreation Time
> elapsed: 120.874 s <<< ERROR!
> Nov 15 02:34:04 org.testcontainers.containers.ContainerLaunchException:
> Container startup failed
> Nov 15 02:34:04 at
> org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:345)
> Nov 15 02:34:04 at
> org.testcontainers.containers.GenericContainer.start(GenericContainer.java:326)
> Nov 15 02:34:04 at
> org.apache.flink.core.testutils.TestContainerExtension.instantiateTestContainer(TestContainerExtension.java:59)
> Nov 15 02:34:04 at
> org.apache.flink.core.testutils.TestContainerExtension.before(TestContainerExtension.java:70)
> Nov 15 02:34:04 at
> org.apache.flink.core.testutils.EachCallbackWrapper.beforeEach(EachCallbackWrapper.java:45)
> Nov 15 02:34:04 at
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeBeforeEachCallbacks$2(TestMethodTestDescriptor.java:166)
> Nov 15 02:34:04 at
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeBeforeMethodsOrCallbacksUntilExceptionOccurs$6(TestMethodTestDescriptor.java:202)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> Nov 15 02:34:04 at
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeBeforeMethodsOrCallbacksUntilExceptionOccurs(TestMethodTestDescriptor.java:202)
> Nov 15 02:34:04 at
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeBeforeEachCallbacks(TestMethodTestDescriptor.java:165)
> Nov 15 02:34:04 at
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:132)
> Nov 15 02:34:04 at
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:68)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService$ExclusiveTask.compute(ForkJoinPoolHierarchicalTestExecutorService.java:185)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService.executeNonConcurrentTasks(ForkJoinPoolHierarchicalTestExecutorService.java:155)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService.invokeAll(ForkJoinPoolHierarchicalTestExecutorService.java:135)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService$ExclusiveTask.compute(ForkJoinPoolHierarchicalTestExecutorService.java:185)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService.invokeAll(ForkJoinPoolHierarchicalTestExecutorService.java:129)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
> Nov 15 02:34:04 at
> org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)