[
https://issues.apache.org/jira/browse/SPARK-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350135#comment-15350135
]
Vladislav Kuzemchik commented on SPARK-4906:
--------------------------------------------
Same thing with spark 1.6.
We recently migrated from 1.3.1 to 1.6.1.
Had to increase heap from 1G to 32G to at least have it running fir 2-3 days.
With streaming application it is constantly growing, so we have to restart
streaming application once in a while.
Attached screenshot of 2G heap in JProfile.
> Spark master OOMs with exception stack trace stored in JobProgressListener
> --------------------------------------------------------------------------
>
> Key: SPARK-4906
> URL: https://issues.apache.org/jira/browse/SPARK-4906
> Project: Spark
> Issue Type: Bug
> Components: Web UI
> Affects Versions: 1.1.1
> Reporter: Mingyu Kim
> Attachments: LeakingJobProgressListener2OOM.docx
>
>
> Spark master was OOMing with a lot of stack traces retained in
> JobProgressListener. The object dependency goes like the following.
> JobProgressListener.stageIdToData => StageUIData.taskData =>
> TaskUIData.errorMessage
> Each error message is ~10kb since it has the entire stack trace. As we have a
> lot of tasks, when all of the tasks across multiple stages go bad, these
> error messages accounted for 0.5GB of heap at some point.
> Please correct me if I'm wrong, but it looks like all the task info for
> running applications are kept in memory, which means it's almost always bound
> to OOM for long-running applications. Would it make sense to fix this, for
> example, by spilling some UI states to disk?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]