[ 
https://issues.apache.org/jira/browse/FLINK-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050890#comment-17050890
 ] 

Xintong Song commented on FLINK-16406:
--------------------------------------

A summary of bad cases collected so far:
 * [~Jamalarm]  reported in FLINK-16142. He had both problems of memory leak 
and insufficient default size. He has not mention how large metaspace size 
fixes his problem.
 * [~blablabla123] also reported in FLINK-16142 about the insufficient default 
size, fixed by increasing to 256 MB.
 * [~nielsbasjes] reported in a [user ML 
thread|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kubernetes-java-lang-OutOfMemoryError-Metaspace-td33285.html]
 and in FLINK-16142. From what he described that the problem occurs with 
repeatedly executing a job, it sounds like a memory leak problem to me. We 
would need to investigate more into this.
 * John Smith reported in another [user ML 
thread|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/MaxMetaspace-default-may-be-to-low-td33049.html]
 about insufficient default size, fixed by increasing to 256 MB. I believe he 
is also the one who opened FLINK-16278 ([~javadevmtl]).

> Increase default value for JVM Metaspace to minimise its OutOfMemoryError
> -------------------------------------------------------------------------
>
>                 Key: FLINK-16406
>                 URL: https://issues.apache.org/jira/browse/FLINK-16406
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Configuration, Runtime / Task
>    Affects Versions: 1.10.0
>            Reporter: Andrey Zagrebin
>            Assignee: Andrey Zagrebin
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.10.1, 1.11.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> With FLIP-49 
> ([FLINK-13980|https://issues.apache.org/jira/browse/FLINK-13980]), we 
> introduced a limit for JVM Metaspace 
> ('taskmanager.memory.jvm-metaspace.size') when TM JVM process is started. It 
> caused '_OutOfMemoryError: Metaspace_' for some use cases after upgrading to 
> the latest 1.10 version. In some cases, a real class loading leak has been 
> discovered, like in 
> [FLINK-16142|https://issues.apache.org/jira/browse/FLINK-16142]. Some users 
> had to increase the default value to accommodate for their use cases (mostly 
> from 96Mb to 256Mb).
> While this limit was introduced to properly plan Flink resources, especially 
> for container environment, and to detect class loading leaks, the user 
> experience should be as smooth as possible. One way is provide good 
> documentation for this change 
> ([FLINK-16278|https://issues.apache.org/jira/browse/FLINK-16278]).
> Another question is the sanity of the default value. It is still arguable 
> what the default value should be (currently 96Mb). In general, the size 
> depends on the use case (job user code, how many jobs are deployed in the 
> cluster etc).
> This issue tries to tackle this problem by firstly increasing it to 256Mb and 
> overall default process size to 1728Mb in flink-conf.yaml to have no impact 
> on default sizes of other memory components. We also want to poll which 
> Metaspace setting resolved the _OutOfMemoryError_. Please, if you encountered 
> this problem, report here any relevant specifics of your job and your 
> Metaspace size if there was no class loading leak.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to