[
https://issues.apache.org/jira/browse/FLINK-20333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dian Fu updated FLINK-20333:
----------------------------
Description:
Currently the Flink standalone cluster will throw metaspace OOM after
submitting multiple PyFlink UDF jobs. The root cause is that currently the
PyFlink classes are running in user classloader and so each job creates a
separate user class loader to load PyFlink related classes. There are many soft
references and Finalizers in memory (introduced by the underlying Netty), which
prevents the garbage collection of the user classloader of already finished
PyFlink jobs.
Due to their existence, it needs multiple full gc to reclaim the classloader of
the completed job. If only one full gc is performed before the metaspace space
is insufficient, then OOM will occur.
was:
Currently the Flink standalone cluster will throw metaspace OOM after
submitting multiple PyFlink UDF jobs. The root cause is that there are many
soft references and Finalizers in memory, which prevent the garbage collection
of the finished PyFlink job classloader.
Due to their existence, it needs multiple full gc to reclaim the classloader of
the completed job. If only one full gc is performed before the metaspace space
is insufficient, then OOM will occur.
> Flink standalone cluster throws metaspace OOM after submitting multiple
> PyFlink UDF jobs.
> -----------------------------------------------------------------------------------------
>
> Key: FLINK-20333
> URL: https://issues.apache.org/jira/browse/FLINK-20333
> Project: Flink
> Issue Type: Bug
> Components: API / Python
> Reporter: Wei Zhong
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Currently the Flink standalone cluster will throw metaspace OOM after
> submitting multiple PyFlink UDF jobs. The root cause is that currently the
> PyFlink classes are running in user classloader and so each job creates a
> separate user class loader to load PyFlink related classes. There are many
> soft references and Finalizers in memory (introduced by the underlying
> Netty), which prevents the garbage collection of the user classloader of
> already finished PyFlink jobs.
> Due to their existence, it needs multiple full gc to reclaim the classloader
> of the completed job. If only one full gc is performed before the metaspace
> space is insufficient, then OOM will occur.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)