[
https://issues.apache.org/jira/browse/FLINK-25948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zichen Liu updated FLINK-25948:
-------------------------------
Description:
Add .close() BOTH the KDSAsyncClient and the underlying HTTPClient
for BOTH KDS & KDF
maintain a reference to the HTTPClient at all times in order to close it
need to find an appropriate time in the Sink/SinkWriter
was:
Intermittent failures introduced as part of merge (PR#18314:
[FLINK-24228[connectors/firehose] - Unified Async Sink for Kinesis
Firehose|https://github.com/apache/flink/pull/18314]):
# Failures are intermittent and affecting c. 1 in 7 of builds- on
{{flink-ci.flink}} and {{flink-ci.flink-master-mirror}} .
# The issue looks identical to the KinesaliteContainer startup issue (Appendix
1).
# I have managed to reproduce the issue locally - if I start some parallel
containers and keep them running - and then run {{KinesisFirehoseSinkITCase}}
then c. 1 in 6 gives the error.
# The errors have a slightly different appearance on
{{flink-ci.flink-master-mirror}} vs {{flink-ci.flink}} which has the same
appearance as local. I only hope it is a difference in logging/killing
environment variables. (and that there aren’t 2 distinct issues)
Appendix 1:
{code:java}
org.testcontainers.containers.ContainerLaunchException: Container startup failed
at
org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:336)
at
org.testcontainers.containers.GenericContainer.start(GenericContainer.java:317)
at
org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1066)
at
... 11 more
Caused by: org.testcontainers.containers.ContainerLaunchException: Could not
create/start container
at
org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:525)
at
org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:331)
at
org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
... 12 more
Caused by: org.rnorth.ducttape.TimeoutException: Timeout waiting for result
with exception
at
org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:54)
at
{code}
> KDS / KDF Sink should call .close() to clean up resources
> ---------------------------------------------------------
>
> Key: FLINK-25948
> URL: https://issues.apache.org/jira/browse/FLINK-25948
> Project: Flink
> Issue Type: Bug
> Components: Connectors / Common
> Affects Versions: 1.15.0
> Reporter: Zichen Liu
> Assignee: Ahmed Hamdy
> Priority: Critical
> Labels: pull-request-available, test-stability
> Fix For: 1.15.0
>
>
> Add .close() BOTH the KDSAsyncClient and the underlying HTTPClient
> for BOTH KDS & KDF
> maintain a reference to the HTTPClient at all times in order to close it
> need to find an appropriate time in the Sink/SinkWriter
--
This message was sent by Atlassian Jira
(v8.20.1#820001)