[jira] [Commented] (FLINK-25023) ClassLoader leak on JM/TM through indirectly-started Hadoop threads out of user code

Rui Li (Jira) Thu, 30 Dec 2021 19:24:05 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-25023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467078#comment-17467078
 ]


Rui Li commented on FLINK-25023:
--------------------------------

Hey [~keremulutas], when I investigated FLINK-15239, moving hadoop deps to 
parent loader didn't really prevent the leak, but it mitigated the issue by 
leaking only one thread for all the jobs, instead of leaking one thread for 
each job submitted. Sorry my comment was misleading.

> ClassLoader leak on JM/TM through indirectly-started Hadoop threads out of 
> user code
> ------------------------------------------------------------------------------------
>
>                 Key: FLINK-25023
>                 URL: https://issues.apache.org/jira/browse/FLINK-25023
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / FileSystem, Connectors / Hadoop 
> Compatibility, FileSystems
>    Affects Versions: 1.14.0, 1.12.5, 1.13.3
>            Reporter: Nico Kruber
>            Assignee: David Morávek
>            Priority: Major
>              Labels: pull-request-available
>
> If a Flink job is using HDFS through Flink's filesystem abstraction (either 
> on the JM or TM), that code may actually spawn a few threads, e.g. from 
> static class members:
>  * 
> {{org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner}}
>  * {{IPC Parameter Sending Thread#*}}
> These threads are started as soon as the classes are loaded which may be in 
> the context of the user code. In this specific scenario, however, the created 
> threads may contain references to the context class loader (I did not see 
> that though) or, as happened here, it may inherit thread contexts such as the 
> {{ProtectionDomain}} (from an {{{}AccessController{}}}).
> Hence user contexts and user class loaders are leaked into long-running 
> threads that are run in Flink's (parent) classloader.
> Fortunately, it seems to only *leak a single* {{ChildFirstClassLoader}} in 
> this concrete example but that may depend on which code paths each client 
> execution is walking.
>  
> A *proper solution* doesn't seem so simple:
>  * We could try to proactively initialize available file systems in the hope 
> to start all threads in the parent classloader with parent context.
>  * We could create a default {{ProtectionDomain}} for spawned threads as 
> discussed at [https://dzone.com/articles/javalangoutofmemory-permgen], 
> however, the {{StatisticsDataReferenceCleaner}} isn't actually actively 
> spawned from any callback but as a static variable and this with the class 
> loading itself (but maybe this is still possible somehow).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-25023) ClassLoader leak on JM/TM through indirectly-started Hadoop threads out of user code

Reply via email to