sunjincheng created BEAM-9030:
---------------------------------

             Summary: Metaspace memory leak when running python jobs with flink 
runner
                 Key: BEAM-9030
                 URL: https://issues.apache.org/jira/browse/BEAM-9030
             Project: Beam
          Issue Type: Bug
          Components: java-fn-execution, runner-flink
            Reporter: sunjincheng
            Assignee: sunjincheng
             Fix For: 2.19.0


When submitting a Python word count job to a Flink session/standalone cluster 
repeatedly, the meta space usage of the task manager of the Flink cluster will 
continuously increase (about 40MB each time). The reason is that the Beam 
classes are loaded with the user class loader in Flink and there are problems 
with the implementation of `ProcessManager`(from Beam) and 
`ThreadPoolCache`(from netty) which may cause the user class loader could not 
be garbage collected even after the job finished which causes the meta space 
memory leak eventually. You can refer to FLINK-15338[1] for more information.

Regarding to `ProcessManager`, I have created a JIRA BEAM-9006[2] to track it. 
Regarding to `ThreadPoolCache`, it is a Netty problem and has been fixed in 
NETTY#8955[3]. Netty 4.1.35 Final has already included this fix and GRPC 1.22.0 
has already dependents on Netty 4.1.35 Final. So we need to bump the version of 
GRPC to 1.22.0+ (currently 1.21.0).

 

What do you think?

[1] https://issues.apache.org/jira/browse/FLINK-15338
[2] https://issues.apache.org/jira/browse/BEAM-9006
[3] [https://github.com/netty/netty/pull/8955]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to