[ https://issues.apache.org/jira/browse/FLINK-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050890#comment-17050890 ]
Xintong Song commented on FLINK-16406: -------------------------------------- A summary of bad cases collected so far: * [~Jamalarm] reported in FLINK-16142. He had both problems of memory leak and insufficient default size. He has not mention how large metaspace size fixes his problem. * [~blablabla123] also reported in FLINK-16142 about the insufficient default size, fixed by increasing to 256 MB. * [~nielsbasjes] reported in a [user ML thread|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kubernetes-java-lang-OutOfMemoryError-Metaspace-td33285.html] and in FLINK-16142. From what he described that the problem occurs with repeatedly executing a job, it sounds like a memory leak problem to me. We would need to investigate more into this. * John Smith reported in another [user ML thread|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/MaxMetaspace-default-may-be-to-low-td33049.html] about insufficient default size, fixed by increasing to 256 MB. I believe he is also the one who opened FLINK-16278 ([~javadevmtl]). > Increase default value for JVM Metaspace to minimise its OutOfMemoryError > ------------------------------------------------------------------------- > > Key: FLINK-16406 > URL: https://issues.apache.org/jira/browse/FLINK-16406 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration, Runtime / Task > Affects Versions: 1.10.0 > Reporter: Andrey Zagrebin > Assignee: Andrey Zagrebin > Priority: Critical > Labels: pull-request-available > Fix For: 1.10.1, 1.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > > With FLIP-49 > ([FLINK-13980|https://issues.apache.org/jira/browse/FLINK-13980]), we > introduced a limit for JVM Metaspace > ('taskmanager.memory.jvm-metaspace.size') when TM JVM process is started. It > caused '_OutOfMemoryError: Metaspace_' for some use cases after upgrading to > the latest 1.10 version. In some cases, a real class loading leak has been > discovered, like in > [FLINK-16142|https://issues.apache.org/jira/browse/FLINK-16142]. Some users > had to increase the default value to accommodate for their use cases (mostly > from 96Mb to 256Mb). > While this limit was introduced to properly plan Flink resources, especially > for container environment, and to detect class loading leaks, the user > experience should be as smooth as possible. One way is provide good > documentation for this change > ([FLINK-16278|https://issues.apache.org/jira/browse/FLINK-16278]). > Another question is the sanity of the default value. It is still arguable > what the default value should be (currently 96Mb). In general, the size > depends on the use case (job user code, how many jobs are deployed in the > cluster etc). > This issue tries to tackle this problem by firstly increasing it to 256Mb and > overall default process size to 1728Mb in flink-conf.yaml to have no impact > on default sizes of other memory components. We also want to poll which > Metaspace setting resolved the _OutOfMemoryError_. Please, if you encountered > this problem, report here any relevant specifics of your job and your > Metaspace size if there was no class loading leak. -- This message was sent by Atlassian Jira (v8.3.4#803005)