[jira] [Commented] (MESOS-336) Mesos slave should cache executors

Benjamin Hindman (JIRA) Wed, 16 Apr 2014 17:56:29 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972159#comment-13972159
 ]


Benjamin Hindman commented on MESOS-336:
----------------------------------------

Glad to see you working on this [~bernd-mesos]!

First, once we have the checksum capability I'm not convinced we'd want to 
expose a "download once" capability. In fact, it might be something we wish we 
didn't expose in the interface (because, for example, what does it mean to 
download once if, when we have caching, we decide to evict it?).

Second, I'm not opposed to adding more caching support/functionality in the 
slave in the future to better manage persistent state but I'm not sure there is 
much (any?) persistent state we really need for a simple alpha here (aside from 
the actual downloaded files themselves) unless I'm missing something.

Here's what I was thinking:

(1) Add a flag to the slave with a path to a directory which will represent the 
cache (this lets someone put it in on a RAM FS if they please, but could be 
defaulted to just be the 'work_dir/cache').
(2) Add a flag to the slave for the total number of bytes (of URI downloads) to 
cache (again, defaulted to something reasonable).
(3) Add the md5/hash in CommandInfo as suggested in MESOS-700.
(4) Pass the directory ("cache") to the fetcher when it gets invoked.
(5) Update the fetcher: If the requested file exists and matches the checksum 
then "touch" it and copy into the sandbox, extract, chmod, etc. If the file 
doesn't exist, or doesn't match the checksum, download it (overwriting when 
wrong checksum, we'll "fix" that later), copy into sandbox, extract, chmod, etc.
(6) Update the fetcher to always run a "cache eviction" before it exits that 
simply deletes the oldest modified files until we're below the cache limit).

A lot of the generic caching code that gets written for this could eventually 
be moved into the slave (since it'll all be C++), but I don't see any reason 
not to start with it in the fetcher for now.

How does this sound?

> Mesos slave should cache executors
> ----------------------------------
>
>                 Key: MESOS-336
>                 URL: https://issues.apache.org/jira/browse/MESOS-336
>             Project: Mesos
>          Issue Type: Improvement
>          Components: slave
>            Reporter: brian wickman
>            Assignee: Bernd Mathiske
>              Labels: newbie
>
> The slave should be smarter about how it handles pulling down executors.  In 
> our environment, executors rarely change but the slave will always pull it 
> down from regardless HDFS.  This puts undue stress on our HDFS clusters, and 
> is not resilient to reduced HDFS availability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MESOS-336) Mesos slave should cache executors

Reply via email to