[ 
https://issues.apache.org/jira/browse/FLINK-26064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490456#comment-17490456
 ] 

David Morávek commented on FLINK-26064:
---------------------------------------

There is actually not a problem with reusing http clients, it just makes the 
root cause more visible for some reason.

The actual problem is that if the underlying netty event loop group is not 
configured, the SharedSdkEventLoopGroup is used. This means that clients that 
are used by the test suite and clients that are used by the actual pipeline are 
using the very same thread pool for handling HTTP communication.

By default the thread pool uses quite high number of threads (I think something 
like 2x number of CPUs, but would have to dive into Netty to confirm this).

Now the problem is that the Netty thread inherits a context classloader from 
the thread that has created it. This could either be the main thread (if thread 
has been created in the test suite) or FlinkUserClassLoader (if the thread has 
been created by the pipeline - in taskmanager).

There is a slight chance, that once the pipeline finishes (and user classloader 
is closed), we'll initiate a call on thread that has been created with this 
classloader attached.

That has couple of consequences:

    It will simply fail.
    If two jobs are running on the same TM and both are using AWS related 
clients, they will also eventually fail.

The fix if fairly simple, we need to set separate event loops for tests and for 
each sink executed by the TM (we can eventually try to find a way to reuse ELG 
within the TM -> some kind of reference counting scoped per job, if that causes 
too much overhead).

The proper place for fix would be 
AWSGeneralUtil#createAsyncHttpClient(software.amazon.awssdk.utils.AttributeMap, 
software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient.Builder).

> KinesisFirehoseSinkITCase IllegalStateException: Trying to access closed 
> classloader
> ------------------------------------------------------------------------------------
>
>                 Key: FLINK-26064
>                 URL: https://issues.apache.org/jira/browse/FLINK-26064
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Kinesis
>    Affects Versions: 1.15.0
>            Reporter: Piotr Nowojski
>            Assignee: Zichen Liu
>            Priority: Critical
>              Labels: pull-request-available
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31044&view=logs&j=d44f43ce-542c-597d-bf94-b0718c71e5e8&t=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d
> (shortened stack trace, as full is too large)
> {noformat}
> Feb 09 20:05:04 java.util.concurrent.ExecutionException: 
> software.amazon.awssdk.core.exception.SdkClientException: Unable to execute 
> HTTP request: Trying to access closed classloader. Please check if you store 
> classloaders directly or indirectly in static fields. If the stacktrace 
> suggests that the leak occurs in a third party library and cannot be fixed 
> immediately, you can disable this check with the configuration 
> 'classloader.check-leaked-classloader'.
> Feb 09 20:05:04       at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> Feb 09 20:05:04       at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> (...)
> Feb 09 20:05:04 Caused by: 
> software.amazon.awssdk.core.exception.SdkClientException: Unable to execute 
> HTTP request: Trying to access closed classloader. Please check if you store 
> classloaders directly or indirectly in static fields. If the stacktrace 
> suggests that the leak occurs in a third party library and cannot be fixed 
> immediately, you can disable this check with the configuration 
> 'classloader.check-leaked-classloader'.
> Feb 09 20:05:04       at 
> software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:98)
> Feb 09 20:05:04       at 
> software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:43)
> Feb 09 20:05:04       at 
> software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:204)
> Feb 09 20:05:04       at 
> software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:200)
> Feb 09 20:05:04       at 
> software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:179)
> Feb 09 20:05:04       at 
> software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159)
> (...)
> Feb 09 20:05:04 Caused by: java.lang.IllegalStateException: Trying to access 
> closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
> Feb 09 20:05:04       at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
> Feb 09 20:05:04       at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResources(FlinkUserCodeClassLoaders.java:188)
> Feb 09 20:05:04       at 
> java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:348)
> Feb 09 20:05:04       at 
> java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
> Feb 09 20:05:04       at 
> java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
> Feb 09 20:05:04       at 
> javax.xml.stream.FactoryFinder$1.run(FactoryFinder.java:352)
> Feb 09 20:05:04       at java.security.AccessController.doPrivileged(Native 
> Method)
> Feb 09 20:05:04       at 
> javax.xml.stream.FactoryFinder.findServiceProvider(FactoryFinder.java:341)
> Feb 09 20:05:04       at 
> javax.xml.stream.FactoryFinder.find(FactoryFinder.java:313)
> Feb 09 20:05:04       at 
> javax.xml.stream.FactoryFinder.find(FactoryFinder.java:227)
> Feb 09 20:05:04       at 
> javax.xml.stream.XMLInputFactory.newInstance(XMLInputFactory.java:154)
> Feb 09 20:05:04       at 
> software.amazon.awssdk.protocols.query.unmarshall.XmlDomParser.createXmlInputFactory(XmlDomParser.java:124)
> Feb 09 20:05:04       at 
> java.lang.ThreadLocal$SuppliedThreadLocal.initialValue(ThreadLocal.java:284)
> Feb 09 20:05:04       at 
> java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:180)
> Feb 09 20:05:04       at java.lang.ThreadLocal.get(ThreadLocal.java:170)
> (...)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to