[
https://issues.apache.org/jira/browse/FLINK-11205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110329#comment-17110329
]
Till Rohrmann commented on FLINK-11205:
---------------------------------------
I think the particular problem of an increased meta space usage due to rapid
failures has been solved with FLINK-16408 which avoids re-loading classes by
reusing the user code class loader across restarts.
> Task Manager Metaspace Memory Leak
> -----------------------------------
>
> Key: FLINK-11205
> URL: https://issues.apache.org/jira/browse/FLINK-11205
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.5.5, 1.6.2, 1.7.0
> Reporter: Nawaid Shamim
> Priority: Critical
> Attachments: Screenshot 2018-12-18 at 12.14.11.png, Screenshot
> 2018-12-18 at 15.47.55.png
>
>
> Job Restarts causes task manager to dynamically load duplicate classes.
> Metaspace is unbounded and grows with every restart. YARN aggressively kill
> such containers but this affect is immediately seems on different task
> manager which results in death spiral.
> Task Manager uses dynamic loader as described in
> [https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/debugging_classloading.html]
> {quote}
> *YARN*
> YARN classloading differs between single job deployments and sessions:
> * When submitting a Flink job/application directly to YARN (via {{bin/flink
> run -m yarn-cluster ...}}), dedicated TaskManagers and JobManagers are
> started for that job. Those JVMs have both Flink framework classes and user
> code classes in the Java classpath. That means that there is _no dynamic
> classloading_ involved in that case.
> * When starting a YARN session, the JobManagers and TaskManagers are started
> with the Flink framework classes in the classpath. The classes from all jobs
> that are submitted against the session are loaded dynamically.
> {quote}
> The above is not entirely true specially when you set {{-yD
> classloader.resolve-order=parent-first}} . We also above observed the above
> behaviour when submitting a Flink job/application directly to YARN (via
> {{bin/flink run -m yarn-cluster ...}}).
> !Screenshot 2018-12-18 at 12.14.11.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)