Hi Flink community, I am running a session cluster with 1gb of jvm metaspace. Each time I submit and cancel the flink job with python udf I am noticing that the metaspace is gradually increasing until it eventually kills the task manager due to an out of memory exception.
To reproduce this error locally I installed flink v1.16.1 and pyflink 1.16.1 with python version 3.9 . Using the word count python example here https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/table_api_tutorial/ I would submit this python job to my local cluster via ./flink-1.16.1/bin/flink run -pyexec /opt/homebrew/bin/python3.9 --python wordcount.py and then cancel the running job. Over time I can see from the flink UI the metaspace is gradually increasing until the job manager crashes with the following exception 2023-03-29 10:17:19,270 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - Fatal error occurred while executing the TaskManager. Shutting it down... java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either the job requires a larger size of JVM metaspace to load classes or there is a class loading leak. In the first case 'taskmanager.memory.jvm-metaspace.size' configuration option should be increased. If the error persists (usually in cluster after several job (re-)submissions) then there is probably a class loading leak in user code or some of its dependencies which has to be investigated and fixed. The task executor has to be shutdown... at java.lang.ClassLoader.defineClass1(Native Method) ~[?:?] at java.lang.ClassLoader.defineClass(ClassLoader.java:1017) ~[?:?] I noticed that a similar issue was mentioned in https://issues.apache.org/jira/browse/FLINK-15338 due to a leaky class loader but was fixed in version 1.10. Has anyone else encountered similar issues? Thanks, Tom
