[ 
https://issues.apache.org/jira/browse/FLINK-32203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksandr Nitavskyi updated FLINK-32203:
----------------------------------------
    Description: 
*Context*

We have encountered a memory leak related to ClassLoaders in Apache Flink. 
ChildFirstClassLoader is not properly garbage collected, when job is being 
restarted.
Heap Dump has shown that Log4j starts a configuration watch thread, which then 
has Strong reference to ChildFirstClassLoader via AccessControlContext. Since 
thread is never stopped, ChildFirstClassLoader is never cleaned. 

Removal monitorInterval introduced in FLINK-20510 helps to mitigate the issue, 
I believe it could be applied to log4j config by default.

*How to reproduce*
Deploy Flink job, which uses Hadoop File System (e.g. s3a). Redeploy the job -> 
in Task Manager dump you should see multiple 

*AC*
We have a configuration which doesn't lead easy to memory leak with default 
configuration for Flink users.

  was:
*Context*

We have encountered a memory leak related to ClassLoaders in Apache Flink. 
ChildFirstClassLoader is not properly garbage collected, when job is being 
restarted.
Heap Dump has shown that Log4j starts a configuration watch thread, which then 
has Strong reference to ChildFirstClassLoader via AccessControlContext. Since 
thread is never stopped, ChildFirstClassLoader is never cleaned. 

Removal monitorInterval introduced in FLINK-20510 helps to mitigate the issue, 
I believe it could be applied to log4j config by default.

*AC*
We have a configuration which doesn't lead easy to memory leak with default 
configuration for Flink users.


> Potential ClassLoader memory leak due to log4j configuration
> ------------------------------------------------------------
>
>                 Key: FLINK-32203
>                 URL: https://issues.apache.org/jira/browse/FLINK-32203
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Oleksandr Nitavskyi
>            Priority: Major
>         Attachments: classloader_leak.png, 
> stack_trace_example_with_log4j_creation_on_job_reload.log
>
>
> *Context*
> We have encountered a memory leak related to ClassLoaders in Apache Flink. 
> ChildFirstClassLoader is not properly garbage collected, when job is being 
> restarted.
> Heap Dump has shown that Log4j starts a configuration watch thread, which 
> then has Strong reference to ChildFirstClassLoader via AccessControlContext. 
> Since thread is never stopped, ChildFirstClassLoader is never cleaned. 
> Removal monitorInterval introduced in FLINK-20510 helps to mitigate the 
> issue, I believe it could be applied to log4j config by default.
> *How to reproduce*
> Deploy Flink job, which uses Hadoop File System (e.g. s3a). Redeploy the job 
> -> in Task Manager dump you should see multiple 
> *AC*
> We have a configuration which doesn't lead easy to memory leak with default 
> configuration for Flink users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to