[
https://issues.apache.org/jira/browse/FLINK-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414093#comment-16414093
]
Rohit Singh commented on FLINK-9080:
------------------------------------
Based on documentation on Flink, Tried adding job in the flink lib of scheduler
and task manager to avoid dynamic class loading
https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/debugging_classloading.html
Getting following error
{code:java}
Class=o.a.f.r.e.ExecutionGraph Msg=Source: Custom Source -> Sink: Unnamed (1/1)
(3f12f6953a235eb43f07cdf7966b5fcf) switched from RUNNING to FAILED.
org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot
instantiate user function.
at
org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:235)
~[iot-mirror-device.jar:na]
at
org.apache.flink.streaming.runtime.tasks.OperatorChain.<init>(OperatorChain.java:95)
~[iot-mirror-device.jar:na]
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:231)
~[iot-mirror-device.jar:na]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
~[iot-mirror-device.jar:na]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
Caused by: java.lang.ClassCastException: cannot assign instance of
org.apache.commons.collections.map.LinkedMap to field
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.pendingOffsetsToCommit
of type org.apache.commons.collections.map.LinkedMap in instance of
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010
at
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
~[na:1.8.0_91]
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
~[na:1.8.0_91]
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024)
~[na:1.8.0_91]
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
~[na:1.8.0_91]
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
~[na:1.8.0_91]
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
~[na:1.8.0_91]
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
~[na:1.8.0_91]
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
~[na:1.8.0_91]
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
~[na:1.8.0_91]
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
~[na:1.8.0_91]
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
~[na:1.8.0_91]
at
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:290)
~[iot-mirror-device.jar:na]
{code}
> Flink Scheduler goes OOM, suspecting a memory leak
> --------------------------------------------------
>
> Key: FLINK-9080
> URL: https://issues.apache.org/jira/browse/FLINK-9080
> Project: Flink
> Issue Type: Bug
> Components: JobManager
> Affects Versions: 1.4.0
> Reporter: Rohit Singh
> Priority: Critical
> Attachments: Top Level packages.JPG, Top level classes.JPG,
> classesloaded vs unloaded.png
>
>
> Running FLink version 1.4.0. on mesos,scheduler running along with job
> manager in single container, whereas task managers running in seperate
> containers.
> Couple of jobs were running continously, Flink scheduler was working
> properlyalong with task managers. Due to some change in data, one of the jobs
> started failing continuously. In the meantime,there was a surge in flink
> scheduler memory usually eventually died out off OOM
>
> Memory dump analysis was done,
> Following were findings !Top Level packages.JPG!!Top level classes.JPG!
> * Majority of top loaded packages retaining heap indicated towards
> Flinkuserclassloader, glassfish(jersey library), Finalizer classes. (Top
> level package image)
> * Top level classes were of Flinkuserclassloader, (Top Level class image)
> * The number of classes loaded vs unloaded was quite less PFA,inspite of
> adding jvm options of -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled ,
> PFAclassloaded vs unloaded graph, scheduler was restarted 3 times
> * There were custom classes as well which were duplicated during subsequent
> class uploads
> PFA all the images of heap dump. Can you suggest some pointers on as to how
> to overcome this issue.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)