[
https://issues.apache.org/jira/browse/MESOS-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973480#comment-13973480
]
Bernd Mathiske commented on MESOS-336:
--------------------------------------
The last sentence was written a bit too quickly. :-) There has to be some extra
complexity somewhere to deal with concurrency issues. Here is what I have in
mind regarding that.
Problems: There can be multiple mesos-fetcher programs running concurrently. If
they all download the same URI, they could trample over the same cache file.
Plus it is inefficient to have multiple downloads of the same content running.
They won't finish any sooner by THAT sort of work parallelization. :-) Handling
such problems in the fetcher program seems to be a bad idea.
Solution: Prohibit concurrent downloads from to the same URI BEFORE they end up
in invocations of the fetcher program. Only one fetch attempt follows through,
undisturbed and without having to share its bandwidth. After this single fetch
has completed, all waiting parties get notified and proceed by reading from the
file cache.
We can use libprocess futures to organize this inside the slave/containerizer
program. If I understand libprocess correctly, we do not even need thread
synchronization, a thread-safe data structure or any of that. That's because
all the relevant action happens somewhere inside the Slave::runTask method,
which is serialized by nature of being installed as a message handler.
Reading from the file cache can be implemented reusing the same fetcher program
as above, simply by rewriting the URI to a local file URI.
> Mesos slave should cache executors
> ----------------------------------
>
> Key: MESOS-336
> URL: https://issues.apache.org/jira/browse/MESOS-336
> Project: Mesos
> Issue Type: Improvement
> Components: slave
> Reporter: brian wickman
> Assignee: Bernd Mathiske
> Labels: newbie
>
> The slave should be smarter about how it handles pulling down executors. In
> our environment, executors rarely change but the slave will always pull it
> down from regardless HDFS. This puts undue stress on our HDFS clusters, and
> is not resilient to reduced HDFS availability.
--
This message was sent by Atlassian JIRA
(v6.2#6252)