Du Li created MESOS-700:
---------------------------

             Summary: more efficient distribution of frameworks via HDFS
                 Key: MESOS-700
                 URL: https://issues.apache.org/jira/browse/MESOS-700
             Project: Mesos
          Issue Type: Improvement
          Components: framework
    Affects Versions: 0.13.0, 0.14.0, 0.15.0
         Environment: general
            Reporter: Du Li
             Fix For: 0.13.0, 0.14.0, 0.15.0


I was exploring the latest code (0.15.0) at https://github.com/apache/mesos to 
test the tgz distribution of frameworks. Take spark for example. I created a 
tgz of spark binary and put it on HDFS. After a job is submitted, it is 
decomposed into many tasks. For each task, the assigned mesos slave downloads 
the tgz from HDFS, unzips it, and executes some script to launch the task. This 
seems very wasteful and unnecessary. 

Does the following suggestion make sense? When a spark job is submitted, the 
spark/mesos master calculates a checksum or something the like for the tgz 
distribution. Then the checksum is sent to the slaves when tasks are assigned. 
If the same file has already been downloaded/unzipped, a slave directly 
launches the task. This way the tgz is processed at most once for each job 
(which may have thousands of tasks). The aggregated saving would be tremendous.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to