Chengwei Yang created MESOS-1746:
------------------------------------

             Summary: clear TaskStatus data to avoid OOM
                 Key: MESOS-1746
                 URL: https://issues.apache.org/jira/browse/MESOS-1746
             Project: Mesos
          Issue Type: Bug
         Environment: mesos-0.19.0
            Reporter: Chengwei Yang
            Assignee: Chengwei Yang


Spark on mesos may use TaskStatus to transfer computed result between worker 
and scheduler, the source code like below (spark 1.0.2)

{code}
        val serializedResult = {
          if (serializedDirectResult.limit >= execBackend.akkaFrameSize() -
              AkkaUtils.reservedSizeBytes) {                                    
                                                                                
                                                                                
                                  
            logInfo("Storing result for " + taskId + " in local BlockManager")
            val blockId = TaskResultBlockId(taskId)
            env.blockManager.putBytes(
              blockId, serializedDirectResult, StorageLevel.MEMORY_AND_DISK_SER)
            ser.serialize(new IndirectTaskResult[Any](blockId))                 
                                                                                
                                                                                
                                  
          } else {                                                              
                                                                                
                                                                                
                                  
            logInfo("Sending result for " + taskId + " directly to driver")
            serializedDirectResult                                              
                                                                                
                                                                                
                                  
          }                                                                     
                                                                                
                                                                                
                                  
        }    
{code}

And In our test environment, we enlarge akkaFrameSize to 128MB from default 
value (10MB) and this cause our mesos-master process will be OOM in tens of 
minutes when running spark tasks in fine-grained mode.

As you can see, even changed akkaFrameSize back to default value (10MB), it's 
very likely to make mesos-master OOM too, however more slower.

So I think it's good to delete data from TaskStatus since this is only designed 
to on-top framework and we don't interested in it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to