[ 
https://issues.apache.org/jira/browse/MESOS-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970207#comment-13970207
 ] 

Bernd Mathiske commented on MESOS-336:
--------------------------------------

Agreed that we should split the task like that. And the first task can be split 
up even more. I would prefer to go for a solution without checksum first, get 
that running, then add checksums in a subsequent patch. There is plenty of 
refactoring to do to support ANY caching.

I propose the first patch to have a switch that toggles between the current 
behavior (download always) and "download only once". (In both cases, package 
extraction and chmod will be supported.)

The next patch could introduce checksums and one or two ways of using these.

The main obstacle for the first cache behavior implementation is that the 
fetcher is an external program. I understand at least two good reasons for the 
latter and I do want to continue to support these:
1. Allow users to replace the fetcher in their installation with a custom 
solution without recompiling Mesos.
2. Disentangle Mesos from libraries that are only used for fetching.

So, I propose to keep the purely download-oriented part of the fetcher in the 
separate program, pretty much as it is programmed now, but to move everything 
that is downstream from there (chmod, extract) into the slave OS process. Why? 
Because at runtime it will have to happen AFTER the bookkeeping that supports 
the download caching behavior and THAT should be in the slave. Otherwise we 
need secondary storage to keep track of cache state, and we have explicit 
writing and reading thereof etc. by the fetcher, plus extra failure modes to 
program against. Here the significant extra complexity for persisting state can 
be entirely avoided by a little bit of code refactoring.



> Mesos slave should cache executors
> ----------------------------------
>
>                 Key: MESOS-336
>                 URL: https://issues.apache.org/jira/browse/MESOS-336
>             Project: Mesos
>          Issue Type: Improvement
>          Components: slave
>            Reporter: brian wickman
>            Assignee: Bernd Mathiske
>              Labels: newbie
>
> The slave should be smarter about how it handles pulling down executors.  In 
> our environment, executors rarely change but the slave will always pull it 
> down from regardless HDFS.  This puts undue stress on our HDFS clusters, and 
> is not resilient to reduced HDFS availability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to