[ 
https://issues.apache.org/jira/browse/FLINK-30141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735782#comment-17735782
 ] 

Ryan Skraba commented on FLINK-30141:
-------------------------------------

I took a look at this and the related issue FLINK-26402 -- it looks like the 
*503 Service Unavailable* statuses are not rare: they occur about 1 in a couple 
hundred API calls to Minio on container startup.  On the other hand, the retry 
mechanism built into Amazon API clients _usually_ try again correctly until 
they succeed.  Sometimes, the Minio container doesn't move to the correct state 
to service API calls quickly enough, the default retry strategy fails 
eventually and we see the error here.

I can reproduce this pretty reliably by running a unit test somewhere between 
1K-10K times.  At first I assumed it occurred when the system was loaded while 
running the test, but that doesn't appear to be the case.

Attempting to start up the container more than once might be the right thing to 
do here.  If the call to Minio fails while creating the default bucket, the 
container should be discarded and tried again.  This should have no overhead on 
the daily CI runs.

> MinioTestContainerTest failed due to IllegalStateException in container 
> startup
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-30141
>                 URL: https://issues.apache.org/jira/browse/FLINK-30141
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / FileSystem, Tests
>    Affects Versions: 1.17.0, 1.18.0
>            Reporter: Matthias Pohl
>            Priority: Major
>              Labels: pull-request-available, test-stability
>
> [This 
> build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=43182&view=logs&j=a1ac4ce4-9a4f-5fdb-3290-7e163fba19dc&t=3a8f44aa-4415-5b14-37d5-5fecc568b139&l=15531]
>  failed due to an {{IllegalStateException}} during container startup:
> {code:java}
> Nov 15 02:34:04 [ERROR] 
> org.apache.flink.fs.s3.common.MinioTestContainerTest.testBucketCreation  Time 
> elapsed: 120.874 s  <<< ERROR!
> Nov 15 02:34:04 org.testcontainers.containers.ContainerLaunchException: 
> Container startup failed
> Nov 15 02:34:04       at 
> org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:345)
> Nov 15 02:34:04       at 
> org.testcontainers.containers.GenericContainer.start(GenericContainer.java:326)
> Nov 15 02:34:04       at 
> org.apache.flink.core.testutils.TestContainerExtension.instantiateTestContainer(TestContainerExtension.java:59)
> Nov 15 02:34:04       at 
> org.apache.flink.core.testutils.TestContainerExtension.before(TestContainerExtension.java:70)
> Nov 15 02:34:04       at 
> org.apache.flink.core.testutils.EachCallbackWrapper.beforeEach(EachCallbackWrapper.java:45)
> Nov 15 02:34:04       at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeBeforeEachCallbacks$2(TestMethodTestDescriptor.java:166)
> Nov 15 02:34:04       at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeBeforeMethodsOrCallbacksUntilExceptionOccurs$6(TestMethodTestDescriptor.java:202)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> Nov 15 02:34:04       at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeBeforeMethodsOrCallbacksUntilExceptionOccurs(TestMethodTestDescriptor.java:202)
> Nov 15 02:34:04       at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeBeforeEachCallbacks(TestMethodTestDescriptor.java:165)
> Nov 15 02:34:04       at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:132)
> Nov 15 02:34:04       at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:68)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService$ExclusiveTask.compute(ForkJoinPoolHierarchicalTestExecutorService.java:185)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService.executeNonConcurrentTasks(ForkJoinPoolHierarchicalTestExecutorService.java:155)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService.invokeAll(ForkJoinPoolHierarchicalTestExecutorService.java:135)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService$ExclusiveTask.compute(ForkJoinPoolHierarchicalTestExecutorService.java:185)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService.invokeAll(ForkJoinPoolHierarchicalTestExecutorService.java:129)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
> Nov 15 02:34:04       at 
> org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to