[
https://issues.apache.org/jira/browse/MESOS-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772438#comment-13772438
]
Benjamin Hindman commented on MESOS-700:
----------------------------------------
Also, one suggestion for an implementation of this:
(1) Add an 'md5' or 'sha' or 'sha1' field to the CommandInfo.URI protocol
buffer message.
(2) Add some code in the slave (launcher) to look in a well known directory for
URIs before downloading, checking the checksum as appropriate. If found, copy
that tgz/zip (or the extracted contents), else download the URI. Note that if a
downloaded URI does not match the checksum provided then the slave should kill
the task/executor with an appropriate error message.
(3) Add some state into the slave that garbage collects the downloaded
tgz/zip/extracted after some period of time.
> more efficient distribution of frameworks via HDFS
> --------------------------------------------------
>
> Key: MESOS-700
> URL: https://issues.apache.org/jira/browse/MESOS-700
> Project: Mesos
> Issue Type: Improvement
> Components: framework
> Affects Versions: 0.13.0, 0.14.0, 0.15.0
> Environment: general
> Reporter: Du Li
> Fix For: 0.13.0, 0.14.0, 0.15.0
>
>
> I was exploring the latest code (0.15.0) at https://github.com/apache/mesos
> to test the tgz distribution of frameworks. Take spark for example. I created
> a tgz of spark binary and put it on HDFS. After a job is submitted, it is
> decomposed into many tasks. For each task, the assigned mesos slave downloads
> the tgz from HDFS, unzips it, and executes some script to launch the task.
> This seems very wasteful and unnecessary.
> Does the following suggestion make sense? When a spark job is submitted, the
> spark/mesos master calculates a checksum or something the like for the tgz
> distribution. Then the checksum is sent to the slaves when tasks are
> assigned. If the same file has already been downloaded/unzipped, a slave
> directly launches the task. This way the tgz is processed at most once for
> each job (which may have thousands of tasks). The aggregated saving would be
> tremendous.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira