[jira] [Commented] (FLINK-9080) Flink Scheduler goes OOM, suspecting a memory leak

Rohit Singh (JIRA) Thu, 29 Mar 2018 09:32:01 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419305#comment-16419305
 ]


Rohit Singh commented on FLINK-9080:
------------------------------------

Hi Till,

Tried 1.4.2 flink, but not 1.5.0

Posted this on stackoverflow 
https://stackoverflow.com/questions/49530333/getting-following-class-cast-exception-while-adding-job-jar-to-flink-home-lib

Based on that found out that, Our job contains  compile group: 
'org.apache.commons', name: 'commons-collections4', version: '4.1' as 
dependency 

and flink uses   compile group: 'commons-collections', name: 
'commons-collections', version: '3.2.2'  i. e 3.2.2 version removed the 
dependency and use the same dependecy whcih flink uses, but still was getting 
the same error. 



I can try out 1.5.0 release branch fix andd share the results, is there any fix 
targeted around this issue. Also in long term, is there any plan to avoid 
dynamic class loading in mesos or any other workaround to overcome the issue 
apart from adding jar in flink lib. Please let me know your thoughts on this.

 

 

> Flink Scheduler goes OOM, suspecting a memory leak
> --------------------------------------------------
>
>                 Key: FLINK-9080
>                 URL: https://issues.apache.org/jira/browse/FLINK-9080
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>    Affects Versions: 1.4.0
>            Reporter: Rohit Singh
>            Priority: Blocker
>             Fix For: 1.5.0
>
>         Attachments: Top Level packages.JPG, Top level classes.JPG, 
> classesloaded vs unloaded.png
>
>
> Running FLink version 1.4.0. on mesos,scheduler running along  with job 
> manager in single container, whereas task managers running in seperate 
> containers.
> Couple of jobs were running continously, Flink scheduler was working 
> properlyalong with task managers. Due to some change in data, one of the jobs 
> started failing continuously. In the meantime,there was a surge in  flink 
> scheduler memory usually eventually died out off OOM
>  
> Memory dump analysis was done, 
> Following were findings  !Top Level packages.JPG!!Top level classes.JPG!
>  *  Majority of top loaded packages retaining heap indicated towards 
> Flinkuserclassloader, glassfish(jersey library), Finalizer classes. (Top 
> level package image)
>  * Top level classes were of Flinkuserclassloader, (Top Level class image)
>  * The number of classes loaded vs unloaded was quite less  PFA,inspite of 
> adding jvm options of -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled , 
> PFAclassloaded vs unloaded graph, scheduler was restarted 3 times
>  * There were custom classes as well which were duplicated during subsequent 
> class uploads
> PFA all the images of heap dump.  Can you suggest some pointers on as to how 
> to overcome this issue.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-9080) Flink Scheduler goes OOM, suspecting a memory leak

Reply via email to