[ 
https://issues.apache.org/jira/browse/FLINK-26309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zichen Liu updated FLINK-26309:
-------------------------------
    Description: 
The localstack container is used to mock aws services so tests can hit a mock 
endpoint. When we start the container, it occasionally fails with the 
exception: {{IOException - closed before protocol could be determined.}} This 
is what happens whenever we make a request (e.g. read from mock s3) prior to 
the container being ready**. A similar issue was resolved in Kinesalite by 
increasing the timeout from 1s to 10s, but Localstack is a larger box. In 
#18887 we increased the timeout to 30s, which would be sufficient for it to 
start. However, it would be better to use [this 
strategy|[https://ignas.me/tech/waiting-localstack-s3-start/]] to know sooner.

** This is a known limitation, please see issue 
[#1202|https://github.com/localstack/localstack/issues/1202] in 
Localstack/localstack or this [blog 
post|https://ignas.me/tech/waiting-localstack-s3-start/].

 

  was:
The firehose sink is an at least once sink. But we only expect there to be 
duplicates during failures and reload from save/checkpoints. During 
`KinesisFirehoseSinkITCase` there is no such action, and yet, we occasionally 
get duplicates in the test result. The test was originally asserting exactly 
once erroneously and this has been fixed in #18876 to assert at least once. 
However, a curiosity still remains: why were there duplicates?

That is the purpose of this investigation.
{code:java}
Feb 22 02:47:37 [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time 
elapsed: 83.215 s <<< FAILURE! - in 
org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase
Feb 22 02:47:37 [ERROR] 
org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase.test  Time 
elapsed: 50.712 s  <<< FAILURE!
Feb 22 02:47:37 org.opentest4j.AssertionFailedError: 
Feb 22 02:47:37 
Feb 22 02:47:37 expected: 92
Feb 22 02:47:37  but was: 93
Feb 22 02:47:37         at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Feb 22 02:47:37         at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
Feb 22 02:47:37         at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
Feb 22 02:47:37         at 
org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase.test(KinesisFirehoseSinkITCase.java:133)
Feb 22 02:47:37         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
Feb 22 02:47:37         at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Feb 22 02:47:37         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Feb 22 02:47:37         at java.lang.reflect.Method.invoke(Method.java:498)
Feb 22 02:47:37         at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
Feb 22 02:47:37         at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
Feb 22 02:47:37         at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
Feb 22 02:47:37         at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
Feb 22 02:47:37         at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
Feb 22 02:47:37         at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
Feb 22 02:47:37         at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
Feb 22 02:47:37         at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
Feb 22 02:47:37         at 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
Feb 22 02:47:37         at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
Feb 22 02:47:37         at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
Feb 22 02:47:37         at 
org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
Feb 22 02:47:37         at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
Feb 22 02:47:37         at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
Feb 22 02:47:37         at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
Feb 22 02:47:37         at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
Feb 22 02:47:37         at 
org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:30)
Feb 22 02:47:37         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
Feb 22 02:47:37         at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
Feb 22 02:47:37         at 
org.junit.runners.ParentRunner.run(ParentRunner.java:413)
Feb 22 02:47:37         at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
Feb 22 02:47:37         at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
Feb 22 02:47:37         at 
org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
Feb 22 02:47:37         at 
org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
 {code}
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31983&view=logs&j=d44f43ce-542c-597d-bf94-b0718c71e5e8&t=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d&l=44249]


> Add a polling strategy to determine whether Localstack test container has 
> started
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-26309
>                 URL: https://issues.apache.org/jira/browse/FLINK-26309
>             Project: Flink
>          Issue Type: Technical Debt
>          Components: Connectors / Common
>            Reporter: Zichen Liu
>            Assignee: Zichen Liu
>            Priority: Minor
>              Labels: curiosity, investigation
>
> The localstack container is used to mock aws services so tests can hit a mock 
> endpoint. When we start the container, it occasionally fails with the 
> exception: {{IOException - closed before protocol could be determined.}} This 
> is what happens whenever we make a request (e.g. read from mock s3) prior to 
> the container being ready**. A similar issue was resolved in Kinesalite by 
> increasing the timeout from 1s to 10s, but Localstack is a larger box. In 
> #18887 we increased the timeout to 30s, which would be sufficient for it to 
> start. However, it would be better to use [this 
> strategy|[https://ignas.me/tech/waiting-localstack-s3-start/]] to know sooner.
> ** This is a known limitation, please see issue 
> [#1202|https://github.com/localstack/localstack/issues/1202] in 
> Localstack/localstack or this [blog 
> post|https://ignas.me/tech/waiting-localstack-s3-start/].
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to