[
https://issues.apache.org/jira/browse/SPARK-12511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-12511:
------------------------------------
Assignee: Shixiong Zhu (was: Apache Spark)
> streaming driver with checkpointing unable to finalize leading to OOM
> ---------------------------------------------------------------------
>
> Key: SPARK-12511
> URL: https://issues.apache.org/jira/browse/SPARK-12511
> Project: Spark
> Issue Type: Bug
> Components: PySpark, Streaming
> Affects Versions: 1.5.2, 1.6.0
> Environment: pyspark 1.5.2
> yarn 2.6.0
> python 2.6
> centos 6.5
> openjdk 1.8.0
> Reporter: Antony Mayi
> Assignee: Shixiong Zhu
> Priority: Critical
> Attachments: bug.py, finalizer-classes.png, finalizer-pending.png,
> finalizer-spark_assembly.png
>
>
> Spark streaming application when configured with checkpointing is filling
> driver's heap with multiple ZipFileInputStream instances as results of
> spark-assembly.jar (potentially some others like for example snappy-java.jar)
> getting repetitively referenced (loaded?). Java Finalizer can't finalize
> these ZipFileInputStream instances and it eventually takes all heap leading
> the driver to OOM crash.
> h2. Steps to reproduce:
> * Submit attached [^bug.py] to spark
> * Leave it running and monitor the driver java process heap
> ** with heap dump you will primarily see growing instances of byte array data
> (here cumulated zip payload of the jar refs):
> {noformat}
> num #instances #bytes class name
> ----------------------------------------------
> 1: 32653 32735296 [B
> 2: 48000 5135816 [C
> 3: 41 1344144 [Lscala.concurrent.forkjoin.ForkJoinTask;
> 4: 11362 1261816 java.lang.Class
> 5: 47054 1129296 java.lang.String
> 6: 25460 1018400 java.lang.ref.Finalizer
> 7: 9802 789400 [Ljava.lang.Object;
> {noformat}
> ** with visualvm you can see:
> *** increasing number of objects pending for finalization
> !finalizer-pending.png!
> *** increasing number of ZipFileInputStreams instances related to the
> spark-assembly.jar referenced by Finalizer
> !finalizer-spark_assembly.png!
> * Depending on the heap size and running time this will lead to driver OOM
> crash
> h2. Comments
> * The [^bug.py] is lightweight proof of the problem. In production I am
> experiencing this as quite rapid effect - in few hours it eats gigs of heap
> and kills the app.
> * If the same [^bug.py] is run without checkpointing there is no issue
> whatsoever.
> * Not sure if it is just pyspark related.
> * In [^bug.py] I am using the socketTextStream input but seems to be
> independent of the input type (in production having same problem with Kafka
> direct stream, have seen it even with textFileStream).
> * It is happening even if the input stream doesn't produce any data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]