[
https://issues.apache.org/jira/browse/MESOS-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972220#comment-13972220
]
Bernd Mathiske commented on MESOS-336:
--------------------------------------
@Vinod: In general, we cannot encode a URI in a filename, because filenames
have limited length (e.g. 255 chars) and URIs can be longer than that. There
would be compression losses that could in principle lead to collisions.
@Ben: OK, so you would want to trust the probability of collision to be small
enough. Fair enough. Maybe use SHA-256 then. But how do you know what the
checksum of the second URL in your example actually is without downloading it
first? Also, I don't think that loading the same resource from multiple
different URIs is an important use case.
Back to your previous comment above where you wrote " If the requested file
exists". What identifies "the requested file"? Primarily, it's the URI, not its
contents. Using a checksum flips this: the contents becomes the identity. So if
the framework presents the checksum in CommandInfo as described in MESOS-700
instead of a URI, where does that checksum come from? The framework user would
have to put it there. Or there would have to be a URI->checksum mapping that is
derived from the first download. Here you have two choices. You can persist
that mapping and have the fetcher write and read it. Or you can keep it in the
slave. I am opting for the latter, because it does not require me to write all
that I/O code for the mapping. But then it turns out that once you have any
mapping, you might as well map URIs to cache files and the checksum becomes
irrelevant...
This would all be easier if the URIs came with checksums pre-attached. Do they?
> Mesos slave should cache executors
> ----------------------------------
>
> Key: MESOS-336
> URL: https://issues.apache.org/jira/browse/MESOS-336
> Project: Mesos
> Issue Type: Improvement
> Components: slave
> Reporter: brian wickman
> Assignee: Bernd Mathiske
> Labels: newbie
>
> The slave should be smarter about how it handles pulling down executors. In
> our environment, executors rarely change but the slave will always pull it
> down from regardless HDFS. This puts undue stress on our HDFS clusters, and
> is not resilient to reduced HDFS availability.
--
This message was sent by Atlassian JIRA
(v6.2#6252)