[ 
https://issues.apache.org/jira/browse/SPARK-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-4906.
------------------------------
    Resolution: Not A Problem

I don't think this is a bug, as much as an indication that you need to set 
spark.ui.retainedStages and/or spark.ui.retainedJobs

> Spark master OOMs with exception stack trace stored in JobProgressListener
> --------------------------------------------------------------------------
>
>                 Key: SPARK-4906
>                 URL: https://issues.apache.org/jira/browse/SPARK-4906
>             Project: Spark
>          Issue Type: Bug
>          Components: Web UI
>    Affects Versions: 1.1.1, 1.6.1
>            Reporter: Mingyu Kim
>         Attachments: LeakingJobProgressListener2OOM.docx, Screen Shot 
> 2016-06-26 at 10.43.57 AM.png
>
>
> Spark master was OOMing with a lot of stack traces retained in 
> JobProgressListener. The object dependency goes like the following.
> JobProgressListener.stageIdToData => StageUIData.taskData => 
> TaskUIData.errorMessage
> Each error message is ~10kb since it has the entire stack trace. As we have a 
> lot of tasks, when all of the tasks across multiple stages go bad, these 
> error messages accounted for 0.5GB of heap at some point.
> Please correct me if I'm wrong, but it looks like all the task info for 
> running applications are kept in memory, which means it's almost always bound 
> to OOM for long-running applications. Would it make sense to fix this, for 
> example, by spilling some UI states to disk?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to