I was trying the latest code at https://github.com/apache/mesos. Take spark for 
example. I created a tgz of spark binary and put it on HDFS. After a job is 
submitted, it is decomposed into many tasks. For each task, the assigned mesos 
slave downloads the tgz, unzips it, and executes some script to launch the 
task. This  seems very wasteful and unnecessary.

Does the following suggestion make sense? When a spark job is submitted, the 
spark/mesos master calculates a checksum or something the like for the tgz 
distribution. Then the checksum is sent to the slaves when tasks are assigned. 
If the same file has been downloaded/unzipped, a slave directly launches the 
task. This way the tgz is processed at most once for each job (which may have 
thousands of tasks). The aggregated saving would be tremendous.

Let me know if you have already considered/evaluated this scheme.

Du

Reply via email to