We definitely thought about this. I'm not sure if we have a ticket to track
this. Feel free to create a ticket.


On Wed, Sep 18, 2013 at 3:01 PM, Du Li <[email protected]> wrote:

> I was trying the latest code at https://github.com/apache/mesos. Take
> spark for example. I created a tgz of spark binary and put it on HDFS.
> After a job is submitted, it is decomposed into many tasks. For each task,
> the assigned mesos slave downloads the tgz, unzips it, and executes some
> script to launch the task. This  seems very wasteful and unnecessary.
>
> Does the following suggestion make sense? When a spark job is submitted,
> the spark/mesos master calculates a checksum or something the like for the
> tgz distribution. Then the checksum is sent to the slaves when tasks are
> assigned. If the same file has been downloaded/unzipped, a slave directly
> launches the task. This way the tgz is processed at most once for each job
> (which may have thousands of tasks). The aggregated saving would be
> tremendous.
>
> Let me know if you have already considered/evaluated this scheme.
>
> Du
>

Reply via email to