[
https://issues.apache.org/jira/browse/MESOS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Timothy St. Clair resolved MESOS-1746.
--------------------------------------
Resolution: Fixed
Fix Version/s: 0.21.0
commit 8538eed683eea99a340ff5272205113db0580a25
Author: Chengwei Yang <[email protected]>
Date: Wed Oct 15 14:12:26 2014 -0500
Delete framework data in TaskStatus to avoid OOM
There was a bug found that Spark use TaskStatus.data to transfer computed
result and mesos-master RES memory keeps increasing fast and finally will be
killed by OOM killer.
Review: https://reviews.apache.org/r/25184
> clear TaskStatus data to avoid OOM
> ----------------------------------
>
> Key: MESOS-1746
> URL: https://issues.apache.org/jira/browse/MESOS-1746
> Project: Mesos
> Issue Type: Bug
> Environment: mesos-0.19.0
> Reporter: Chengwei Yang
> Assignee: Chengwei Yang
> Fix For: 0.21.0
>
>
> Spark on mesos may use TaskStatus to transfer computed result between worker
> and scheduler, the source code like below (spark 1.0.2)
> {code}
> val serializedResult = {
> if (serializedDirectResult.limit >= execBackend.akkaFrameSize() -
> AkkaUtils.reservedSizeBytes) {
>
>
>
> logInfo("Storing result for " + taskId + " in local BlockManager")
> val blockId = TaskResultBlockId(taskId)
> env.blockManager.putBytes(
> blockId, serializedDirectResult,
> StorageLevel.MEMORY_AND_DISK_SER)
> ser.serialize(new IndirectTaskResult[Any](blockId))
>
>
>
> } else {
>
>
>
> logInfo("Sending result for " + taskId + " directly to driver")
> serializedDirectResult
>
>
>
> }
>
>
>
> }
> {code}
> And In our test environment, we enlarge akkaFrameSize to 128MB from default
> value (10MB) and this cause our mesos-master process will be OOM in tens of
> minutes when running spark tasks in fine-grained mode.
> As you can see, even changed akkaFrameSize back to default value (10MB), it's
> very likely to make mesos-master OOM too, however more slower.
> So I think it's good to delete data from TaskStatus since this is only
> designed to on-top framework and we don't interested in it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)