[ 
https://issues.apache.org/jira/browse/MESOS-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968660#comment-13968660
 ] 

Bernd Mathiske commented on MESOS-336:
--------------------------------------

I suggest the following approach. All URI contents gets downloaded into a 
fetcher result cache directory (short: fetch dir) per slave instead of a work 
dir per executor.  Extraction of archives (e.g. *.tgz files) also happens per 
slave, inside the fetch dir. The extracted resources are then soft-linked into 
each executor's work dir.

How to handle different users and chmod-ing for them? There is a separate fetch 
subdir for each fetched URI/user combination. In case of an archive, we extract 
and chmod once per user. If it's not an archive, we make a copy and chmod per 
user. In any case, we only download once, regardless of user settings.

The main problem I am facing now is persisting what URIs have been downloaded 
and resulted in what fetch subdir. This info needs to be kept at least for the 
duration of the slave process. (No need to go beyond that as in case a slave 
fails, we can simply wipe the entire fetch cache on recovery.) It would be 
simpler and foster less fragile source code if the fetcher were part of the 
slave program, not a separate program. But I reckon we can still keep the 
required state in the slave's dynamic memory and use it to direct fetcher 
program invocations. Then we have to be careful to keep what the fetcher does 
and what the slave knows in sync, though.


> Mesos slave should cache executors
> ----------------------------------
>
>                 Key: MESOS-336
>                 URL: https://issues.apache.org/jira/browse/MESOS-336
>             Project: Mesos
>          Issue Type: Improvement
>          Components: slave
>            Reporter: brian wickman
>            Assignee: Bernd Mathiske
>              Labels: newbie
>
> The slave should be smarter about how it handles pulling down executors.  In 
> our environment, executors rarely change but the slave will always pull it 
> down from regardless HDFS.  This puts undue stress on our HDFS clusters, and 
> is not resilient to reduced HDFS availability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to