[jira] [Comment Edited] (MESOS-1746) clear TaskStatus data to avoid OOM

Chengwei Yang (JIRA) Tue, 16 Sep 2014 18:07:01 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136605#comment-14136605
 ]


Chengwei Yang edited comment on MESOS-1746 at 9/17/14 1:06 AM:
---------------------------------------------------------------

[~tstclair], yes, spark stores very large data into TaskStatus, since there is 
a "data" field in TaskStatus which was supposed to be used to store application 
specific data, so we can not prevent applications (like spark) from doing so.

please help to review: https://reviews.apache.org/r/25184/


was (Author: chengwei-yang):
[~tstclair], yes, spark stores very large data into TaskStatus, since there is 
a "data" field in TaskStatus which was supposed to be used to store application 
specific data, so we can not prevent applications (like spark) from doing so.

> clear TaskStatus data to avoid OOM
> ----------------------------------
>
>                 Key: MESOS-1746
>                 URL: https://issues.apache.org/jira/browse/MESOS-1746
>             Project: Mesos
>          Issue Type: Bug
>         Environment: mesos-0.19.0
>            Reporter: Chengwei Yang
>            Assignee: Chengwei Yang
>
> Spark on mesos may use TaskStatus to transfer computed result between worker 
> and scheduler, the source code like below (spark 1.0.2)
> {code}
>         val serializedResult = {
>           if (serializedDirectResult.limit >= execBackend.akkaFrameSize() -
>               AkkaUtils.reservedSizeBytes) {                                  
>                                                                               
>                                                                               
>                                         
>             logInfo("Storing result for " + taskId + " in local BlockManager")
>             val blockId = TaskResultBlockId(taskId)
>             env.blockManager.putBytes(
>               blockId, serializedDirectResult, 
> StorageLevel.MEMORY_AND_DISK_SER)
>             ser.serialize(new IndirectTaskResult[Any](blockId))               
>                                                                               
>                                                                               
>                                         
>           } else {                                                            
>                                                                               
>                                                                               
>                                         
>             logInfo("Sending result for " + taskId + " directly to driver")
>             serializedDirectResult                                            
>                                                                               
>                                                                               
>                                         
>           }                                                                   
>                                                                               
>                                                                               
>                                         
>         }    
> {code}
> And In our test environment, we enlarge akkaFrameSize to 128MB from default 
> value (10MB) and this cause our mesos-master process will be OOM in tens of 
> minutes when running spark tasks in fine-grained mode.
> As you can see, even changed akkaFrameSize back to default value (10MB), it's 
> very likely to make mesos-master OOM too, however more slower.
> So I think it's good to delete data from TaskStatus since this is only 
> designed to on-top framework and we don't interested in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-1746) clear TaskStatus data to avoid OOM

Reply via email to