[
https://issues.apache.org/jira/browse/FLINK-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419021#comment-16419021
]
Rohit Singh commented on FLINK-9080:
------------------------------------
Any updates on this issue, as current workaround is to restart the scheduler,
which cause loss in data.
Please have a look, this is becoming a showstopper for us.
> Flink Scheduler goes OOM, suspecting a memory leak
> --------------------------------------------------
>
> Key: FLINK-9080
> URL: https://issues.apache.org/jira/browse/FLINK-9080
> Project: Flink
> Issue Type: Bug
> Components: JobManager
> Affects Versions: 1.4.0
> Reporter: Rohit Singh
> Priority: Critical
> Attachments: Top Level packages.JPG, Top level classes.JPG,
> classesloaded vs unloaded.png
>
>
> Running FLink version 1.4.0. on mesos,scheduler running along with job
> manager in single container, whereas task managers running in seperate
> containers.
> Couple of jobs were running continously, Flink scheduler was working
> properlyalong with task managers. Due to some change in data, one of the jobs
> started failing continuously. In the meantime,there was a surge in flink
> scheduler memory usually eventually died out off OOM
>
> Memory dump analysis was done,
> Following were findings !Top Level packages.JPG!!Top level classes.JPG!
> * Majority of top loaded packages retaining heap indicated towards
> Flinkuserclassloader, glassfish(jersey library), Finalizer classes. (Top
> level package image)
> * Top level classes were of Flinkuserclassloader, (Top Level class image)
> * The number of classes loaded vs unloaded was quite less PFA,inspite of
> adding jvm options of -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled ,
> PFAclassloaded vs unloaded graph, scheduler was restarted 3 times
> * There were custom classes as well which were duplicated during subsequent
> class uploads
> PFA all the images of heap dump. Can you suggest some pointers on as to how
> to overcome this issue.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)