[jira] [Commented] (MESOS-700) more efficient distribution of frameworks via HDFS

Du Li (JIRA) Thu, 19 Sep 2013 16:22:16 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772451#comment-13772451
 ]


Du Li commented on MESOS-700:
-----------------------------

Thanks Ben for quick attention. 

I would further recommend that one copy of the tgz/zip file and its extracted 
contents be shared among all tasks of a job or framework instance. It's garbage 
collected at end of the job. One main advantage of spark and BDAS is to avoid 
disk I/O by caching data and intermediate results in memory. A typical 
partition of data assigned to a task/slave is perhaps around 64 MB. This 
advantage would be undermined if you had to download/unzip or copy a 79 MB 
distribution of the framework for each task, which involves quite a bit disk 
I/O.
                
> more efficient distribution of frameworks via HDFS
> --------------------------------------------------
>
>                 Key: MESOS-700
>                 URL: https://issues.apache.org/jira/browse/MESOS-700
>             Project: Mesos
>          Issue Type: Improvement
>          Components: framework
>    Affects Versions: 0.13.0, 0.14.0, 0.15.0
>         Environment: general
>            Reporter: Du Li
>             Fix For: 0.13.0, 0.14.0, 0.15.0
>
>
> I was exploring the latest code (0.15.0) at https://github.com/apache/mesos 
> to test the tgz distribution of frameworks. Take spark for example. I created 
> a tgz of spark binary and put it on HDFS. After a job is submitted, it is 
> decomposed into many tasks. For each task, the assigned mesos slave downloads 
> the tgz from HDFS, unzips it, and executes some script to launch the task. 
> This seems very wasteful and unnecessary. 
> Does the following suggestion make sense? When a spark job is submitted, the 
> spark/mesos master calculates a checksum or something the like for the tgz 
> distribution. Then the checksum is sent to the slaves when tasks are 
> assigned. If the same file has already been downloaded/unzipped, a slave 
> directly launches the task. This way the tgz is processed at most once for 
> each job (which may have thousands of tasks). The aggregated saving would be 
> tremendous.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MESOS-700) more efficient distribution of frameworks via HDFS

Reply via email to