[jira] [Commented] (FLINK-28248) Metaspace memory is leaking when repeatedly submitting Beam batch pipelines via the REST API

Chesnay Schepler (Jira) Tue, 05 Jul 2022 23:56:33 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-28248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562999#comment-17562999
 ]


Chesnay Schepler commented on FLINK-28248:
------------------------------------------

Where is that ThreadLocal coming from? Do you use them in your user code?

ThreadLocals are highly problematic in systems where long-running threads (like 
those from Flink) cross classloader boundaries, because usaers typically don't 
bother removing entries from the map again.

I'm not aware of us using thread locals in the runtime, so I'd think it's 
either in your code, beam or some other library. In this case there's nothing 
we could on our end.

> Metaspace memory is leaking when repeatedly submitting Beam batch pipelines 
> via the REST API
> --------------------------------------------------------------------------------------------
>
>                 Key: FLINK-28248
>                 URL: https://issues.apache.org/jira/browse/FLINK-28248
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Core
>    Affects Versions: 1.14.4
>            Reporter: Arkadiusz Gasinski
>            Priority: Major
>         Attachments: image-2022-06-24-14-45-51-689.png, 
> image-2022-06-24-14-51-47-909.png, image-2022-06-24-15-07-43-035.png, 
> image-2022-07-05-15-47-45-038.png, image-2022-07-05-15-51-05-840.png, 
> image-2022-07-05-15-58-43-448.png
>
>
> We have a Flink cluster running on k8s/OpenShift in session mode running our 
> Apache Beam pipelines. Some of these pipelines are streaming pipelines and 
> run continuously; some are batch pipelines submitted periodically whenever 
> there is a load to be processed.
> We believe that the batch pipelines cause the issue. We submit 1 to several 
> batch jobs every 5 minutes. For each job, a new instance of the 
> ChildFirstClassLoader is instantiated and it looks like they are not closed 
> properly after the job finishes.
> Attached is the screenshot from the Eclipse memory analyzer - from the Leak 
> Suspects report. When the heap dump was captured, there were 2 streaming and 
> several batch jobs running plus over 100 finished batch jobs.
> !image-2022-06-24-14-45-51-689.png!
> In our current setup, we allocate 8GB for the metaspace:
> !image-2022-06-24-14-51-47-909.png!
>  
> And the top components from the mem analyzer:
> !image-2022-06-24-15-07-43-035.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-28248) Metaspace memory is leaking when repeatedly submitting Beam batch pipelines via the REST API

Reply via email to