[
https://issues.apache.org/jira/browse/MESOS-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970207#comment-13970207
]
Bernd Mathiske commented on MESOS-336:
--------------------------------------
Agreed that we should split the task like that. And the first task can be split
up even more. I would prefer to go for a solution without checksum first, get
that running, then add checksums in a subsequent patch. There is plenty of
refactoring to do to support ANY caching.
I propose the first patch to have a switch that toggles between the current
behavior (download always) and "download only once". (In both cases, package
extraction and chmod will be supported.)
The next patch could introduce checksums and one or two ways of using these.
The main obstacle for the first cache behavior implementation is that the
fetcher is an external program. I understand at least two good reasons for the
latter and I do want to continue to support these:
1. Allow users to replace the fetcher in their installation with a custom
solution without recompiling Mesos.
2. Disentangle Mesos from libraries that are only used for fetching.
So, I propose to keep the purely download-oriented part of the fetcher in the
separate program, pretty much as it is programmed now, but to move everything
that is downstream from there (chmod, extract) into the slave OS process. Why?
Because at runtime it will have to happen AFTER the bookkeeping that supports
the download caching behavior and THAT should be in the slave. Otherwise we
need secondary storage to keep track of cache state, and we have explicit
writing and reading thereof etc. by the fetcher, plus extra failure modes to
program against. Here the significant extra complexity for persisting state can
be entirely avoided by a little bit of code refactoring.
> Mesos slave should cache executors
> ----------------------------------
>
> Key: MESOS-336
> URL: https://issues.apache.org/jira/browse/MESOS-336
> Project: Mesos
> Issue Type: Improvement
> Components: slave
> Reporter: brian wickman
> Assignee: Bernd Mathiske
> Labels: newbie
>
> The slave should be smarter about how it handles pulling down executors. In
> our environment, executors rarely change but the slave will always pull it
> down from regardless HDFS. This puts undue stress on our HDFS clusters, and
> is not resilient to reduced HDFS availability.
--
This message was sent by Atlassian JIRA
(v6.2#6252)