[
https://issues.apache.org/jira/browse/FLINK-26309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zichen Liu updated FLINK-26309:
-------------------------------
Description:
The localstack container is used to mock aws services so tests can hit a mock
endpoint. When we start the container, it occasionally fails with the
exception: {{IOException - closed before protocol could be determined.}} This
is what happens whenever we make a request (e.g. read from mock s3) prior to
the container being ready**. A similar issue was resolved in Kinesalite by
increasing the timeout from 1s to 10s, but Localstack is a larger box. In
#18887 we increased the timeout to 30s, which would be sufficient for it to
start. However, it would be better to use [this
strategy|[https://ignas.me/tech/waiting-localstack-s3-start/]] to know sooner.
** This is a known limitation, please see issue
[#1202|https://github.com/localstack/localstack/issues/1202] in
Localstack/localstack or this [blog
post|https://ignas.me/tech/waiting-localstack-s3-start/].
was:
The firehose sink is an at least once sink. But we only expect there to be
duplicates during failures and reload from save/checkpoints. During
`KinesisFirehoseSinkITCase` there is no such action, and yet, we occasionally
get duplicates in the test result. The test was originally asserting exactly
once erroneously and this has been fixed in #18876 to assert at least once.
However, a curiosity still remains: why were there duplicates?
That is the purpose of this investigation.
{code:java}
Feb 22 02:47:37 [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time
elapsed: 83.215 s <<< FAILURE! - in
org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase
Feb 22 02:47:37 [ERROR]
org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase.test Time
elapsed: 50.712 s <<< FAILURE!
Feb 22 02:47:37 org.opentest4j.AssertionFailedError:
Feb 22 02:47:37
Feb 22 02:47:37 expected: 92
Feb 22 02:47:37 but was: 93
Feb 22 02:47:37 at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Feb 22 02:47:37 at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
Feb 22 02:47:37 at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
Feb 22 02:47:37 at
org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase.test(KinesisFirehoseSinkITCase.java:133)
Feb 22 02:47:37 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
Feb 22 02:47:37 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Feb 22 02:47:37 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Feb 22 02:47:37 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 22 02:47:37 at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
Feb 22 02:47:37 at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
Feb 22 02:47:37 at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
Feb 22 02:47:37 at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
Feb 22 02:47:37 at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
Feb 22 02:47:37 at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
Feb 22 02:47:37 at
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
Feb 22 02:47:37 at
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
Feb 22 02:47:37 at
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
Feb 22 02:47:37 at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
Feb 22 02:47:37 at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
Feb 22 02:47:37 at
org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
Feb 22 02:47:37 at
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
Feb 22 02:47:37 at
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
Feb 22 02:47:37 at
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
Feb 22 02:47:37 at
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
Feb 22 02:47:37 at
org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:30)
Feb 22 02:47:37 at org.junit.rules.RunRules.evaluate(RunRules.java:20)
Feb 22 02:47:37 at
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
Feb 22 02:47:37 at
org.junit.runners.ParentRunner.run(ParentRunner.java:413)
Feb 22 02:47:37 at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
Feb 22 02:47:37 at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
Feb 22 02:47:37 at
org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
Feb 22 02:47:37 at
org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
{code}
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31983&view=logs&j=d44f43ce-542c-597d-bf94-b0718c71e5e8&t=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d&l=44249]
> Add a polling strategy to determine whether Localstack test container has
> started
> ---------------------------------------------------------------------------------
>
> Key: FLINK-26309
> URL: https://issues.apache.org/jira/browse/FLINK-26309
> Project: Flink
> Issue Type: Technical Debt
> Components: Connectors / Common
> Reporter: Zichen Liu
> Assignee: Zichen Liu
> Priority: Minor
> Labels: curiosity, investigation
>
> The localstack container is used to mock aws services so tests can hit a mock
> endpoint. When we start the container, it occasionally fails with the
> exception: {{IOException - closed before protocol could be determined.}} This
> is what happens whenever we make a request (e.g. read from mock s3) prior to
> the container being ready**. A similar issue was resolved in Kinesalite by
> increasing the timeout from 1s to 10s, but Localstack is a larger box. In
> #18887 we increased the timeout to 30s, which would be sufficient for it to
> start. However, it would be better to use [this
> strategy|[https://ignas.me/tech/waiting-localstack-s3-start/]] to know sooner.
> ** This is a known limitation, please see issue
> [#1202|https://github.com/localstack/localstack/issues/1202] in
> Localstack/localstack or this [blog
> post|https://ignas.me/tech/waiting-localstack-s3-start/].
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)