[
https://issues.apache.org/jira/browse/FLINK-32203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Oleksandr Nitavskyi updated FLINK-32203:
----------------------------------------
Description:
*Context*
We have encountered a memory leak related to ClassLoaders in Apache Flink.
ChildFirstClassLoader is not properly garbage collected, when job is being
restarted.
Heap Dump has shown that Log4j starts a configuration watch thread, which then
has Strong reference to ChildFirstClassLoader via AccessControlContext. Since
thread is never stopped, ChildFirstClassLoader is never cleaned.
Removal monitorInterval introduced in FLINK-20510 helps to mitigate the issue,
I believe it could be applied to log4j config by default.
*How to reproduce*
Deploy Flink job, which uses Hadoop File System (e.g. s3a). Redeploy the job ->
in Task Manager dump you should see multiple
*AC*
We have a configuration which doesn't lead easy to memory leak with default
configuration for Flink users.
was:
*Context*
We have encountered a memory leak related to ClassLoaders in Apache Flink.
ChildFirstClassLoader is not properly garbage collected, when job is being
restarted.
Heap Dump has shown that Log4j starts a configuration watch thread, which then
has Strong reference to ChildFirstClassLoader via AccessControlContext. Since
thread is never stopped, ChildFirstClassLoader is never cleaned.
Removal monitorInterval introduced in FLINK-20510 helps to mitigate the issue,
I believe it could be applied to log4j config by default.
*AC*
We have a configuration which doesn't lead easy to memory leak with default
configuration for Flink users.
> Potential ClassLoader memory leak due to log4j configuration
> ------------------------------------------------------------
>
> Key: FLINK-32203
> URL: https://issues.apache.org/jira/browse/FLINK-32203
> Project: Flink
> Issue Type: Bug
> Reporter: Oleksandr Nitavskyi
> Priority: Major
> Attachments: classloader_leak.png,
> stack_trace_example_with_log4j_creation_on_job_reload.log
>
>
> *Context*
> We have encountered a memory leak related to ClassLoaders in Apache Flink.
> ChildFirstClassLoader is not properly garbage collected, when job is being
> restarted.
> Heap Dump has shown that Log4j starts a configuration watch thread, which
> then has Strong reference to ChildFirstClassLoader via AccessControlContext.
> Since thread is never stopped, ChildFirstClassLoader is never cleaned.
> Removal monitorInterval introduced in FLINK-20510 helps to mitigate the
> issue, I believe it could be applied to log4j config by default.
> *How to reproduce*
> Deploy Flink job, which uses Hadoop File System (e.g. s3a). Redeploy the job
> -> in Task Manager dump you should see multiple
> *AC*
> We have a configuration which doesn't lead easy to memory leak with default
> configuration for Flink users.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)