[
https://issues.apache.org/jira/browse/SPARK-12511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233697#comment-15233697
]
Wei Deng commented on SPARK-12511:
----------------------------------
I also ran into OOM in the streaming driver when I was testing a very simple
pyspark streaming code (taking data from direct kafka stream) on Spark 1.6.0. I
haven't configured checkpoint yet. It appears that the driver will always crash
after running for 9+ hours while nothing abnormal with the executors. Once I
switched to Spark 1.6.1 (which should have included the fix for this bug), my
pyspark streaming driver is able to run for 14 hours now without any sign of
memory leak or OOM.
[~zsxwing] Could you please confirm if this bug might also impact pyspark
streaming driver *without* checkpoint configured?
In case anybody interested in seeing the pyspark streaming code that triggered
the driver OOM under Spark 1.6.0, here it is:
https://github.com/avinashmandava/energyiot/blob/master/analytics/writemetrics.py
> streaming driver with checkpointing unable to finalize leading to OOM
> ---------------------------------------------------------------------
>
> Key: SPARK-12511
> URL: https://issues.apache.org/jira/browse/SPARK-12511
> Project: Spark
> Issue Type: Bug
> Components: PySpark, Streaming
> Affects Versions: 1.5.2, 1.6.0
> Environment: pyspark 1.5.2
> yarn 2.6.0
> python 2.6
> centos 6.5
> openjdk 1.8.0
> Reporter: Antony Mayi
> Assignee: Shixiong Zhu
> Priority: Critical
> Fix For: 1.6.1, 2.0.0
>
> Attachments: bug.py, finalizer-classes.png, finalizer-pending.png,
> finalizer-spark_assembly.png
>
>
> Spark streaming application when configured with checkpointing is filling
> driver's heap with multiple ZipFileInputStream instances as results of
> spark-assembly.jar (potentially some others like for example snappy-java.jar)
> getting repetitively referenced (loaded?). Java Finalizer can't finalize
> these ZipFileInputStream instances and it eventually takes all heap leading
> the driver to OOM crash.
> h2. Steps to reproduce:
> * Submit attached [^bug.py] to spark
> * Leave it running and monitor the driver java process heap
> ** with heap dump you will primarily see growing instances of byte array data
> (here cumulated zip payload of the jar refs):
> {noformat}
> num #instances #bytes class name
> ----------------------------------------------
> 1: 32653 32735296 [B
> 2: 48000 5135816 [C
> 3: 41 1344144 [Lscala.concurrent.forkjoin.ForkJoinTask;
> 4: 11362 1261816 java.lang.Class
> 5: 47054 1129296 java.lang.String
> 6: 25460 1018400 java.lang.ref.Finalizer
> 7: 9802 789400 [Ljava.lang.Object;
> {noformat}
> ** with visualvm you can see:
> *** increasing number of objects pending for finalization
> !finalizer-pending.png!
> *** increasing number of ZipFileInputStreams instances related to the
> spark-assembly.jar referenced by Finalizer
> !finalizer-spark_assembly.png!
> * Depending on the heap size and running time this will lead to driver OOM
> crash
> h2. Comments
> * The [^bug.py] is lightweight proof of the problem. In production I am
> experiencing this as quite rapid effect - in few hours it eats gigs of heap
> and kills the app.
> * If the same [^bug.py] is run without checkpointing there is no issue
> whatsoever.
> * Not sure if it is just pyspark related.
> * In [^bug.py] I am using the socketTextStream input but seems to be
> independent of the input type (in production having same problem with Kafka
> direct stream, have seen it even with textFileStream).
> * It is happening even if the input stream doesn't produce any data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]