[jira] [Commented] (SPARK-13198) sc.stop() does not clean up on driver, causes Java heap OOM.

Sean Owen (JIRA) Fri, 05 Feb 2016 03:25:05 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-13198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134028#comment-15134028
 ]


Sean Owen commented on SPARK-13198:
-----------------------------------

I don't see evidence of a problem here yet. Stuff stays on heap until it is 
GCed. Are you sure you triggered one like with a profiler and then measured?

You also generally would never stop and start a context in an app

> sc.stop() does not clean up on driver, causes Java heap OOM.
> ------------------------------------------------------------
>
>                 Key: SPARK-13198
>                 URL: https://issues.apache.org/jira/browse/SPARK-13198
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos
>    Affects Versions: 1.6.0
>            Reporter: Herman Schistad
>         Attachments: Screen Shot 2016-02-04 at 16.31.28.png, Screen Shot 
> 2016-02-04 at 16.31.40.png, Screen Shot 2016-02-04 at 16.31.51.png
>
>
> When starting and stopping multiple SparkContext's linearly eventually the 
> driver stops working with a "io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Java heap space" error.
> Reproduce by running the following code and loading in ~7MB parquet data each 
> time. The driver heap space is not changed and thus defaults to 1GB:
> {code:java}
> def main(args: Array[String]) {
>   val conf = new SparkConf().setMaster("MASTER_URL").setAppName("")
>   conf.set("spark.mesos.coarse", "true")
>   conf.set("spark.cores.max", "10")
>   for (i <- 1 until 100) {
>     val sc = new SparkContext(conf)
>     val sqlContext = new SQLContext(sc)
>     val events = sqlContext.read.parquet("hdfs://locahost/tmp/something")
>     println(s"Context ($i), number of events: " + events.count)
>     sc.stop()
>   }
> }
> {code}
> The heap space fills up within 20 loops on my cluster. Increasing the number 
> of cores to 50 in the above example results in heap space error after 12 
> contexts.
> Dumping the heap reveals many equally sized "CoarseMesosSchedulerBackend" 
> objects (see attachments). Digging into the inner objects tells me that the 
> `executorDataMap` is where 99% of the data in said object is stored. I do 
> believe though that this is beside the point as I'd expect this whole object 
> to be garbage collected or freed on sc.stop(). 
> Additionally I can see in the Spark web UI that each time a new context is 
> created the number of the "SQL" tab increments by one (i.e. last iteration 
> would have SQL99). After doing stop and creating a completely new context I 
> was expecting this number to be reset to 1 ("SQL").
> I'm submitting the jar file with `spark-submit` and no special flags. The 
> cluster is running Mesos 0.23. I'm running Spark 1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-13198) sc.stop() does not clean up on driver, causes Java heap OOM.

Reply via email to