[ 
https://issues.apache.org/jira/browse/MESOS-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973480#comment-13973480
 ] 

Bernd Mathiske commented on MESOS-336:
--------------------------------------

The last sentence was written a bit too quickly. :-) There has to be some extra 
complexity somewhere to deal with concurrency issues. Here is what I have in 
mind regarding that.

Problems: There can be multiple mesos-fetcher programs running concurrently. If 
they all download the same URI, they could trample over the same cache file. 
Plus it is inefficient to have multiple downloads of the same content running. 
They won't finish any sooner by THAT sort of work parallelization. :-) Handling 
such problems in the fetcher program seems to be a bad idea.

Solution: Prohibit concurrent downloads from to the same URI BEFORE they end up 
in invocations of the fetcher program. Only one fetch attempt follows through, 
undisturbed and without having to share its bandwidth. After this single fetch 
has completed, all waiting parties get notified and proceed by reading from the 
file cache. 

We can use libprocess futures to organize this inside the slave/containerizer 
program. If I understand libprocess correctly, we do not even need thread 
synchronization, a thread-safe data structure or any of that. That's because 
all the relevant action happens somewhere inside the Slave::runTask method, 
which is serialized by nature of being installed as a message handler.

Reading from the file cache can be implemented reusing the same fetcher program 
as above, simply by rewriting the URI to a local file URI.



> Mesos slave should cache executors
> ----------------------------------
>
>                 Key: MESOS-336
>                 URL: https://issues.apache.org/jira/browse/MESOS-336
>             Project: Mesos
>          Issue Type: Improvement
>          Components: slave
>            Reporter: brian wickman
>            Assignee: Bernd Mathiske
>              Labels: newbie
>
> The slave should be smarter about how it handles pulling down executors.  In 
> our environment, executors rarely change but the slave will always pull it 
> down from regardless HDFS.  This puts undue stress on our HDFS clusters, and 
> is not resilient to reduced HDFS availability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to