[
https://issues.apache.org/jira/browse/MESOS-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574187#comment-14574187
]
Bernd Mathiske commented on MESOS-2073:
---------------------------------------
To come up with a design for checksum-based automatic refresh, we can
distinguish two scenarios:
1. The framework/user provides the checksum at the site from where the URI
in question is to be downloaded from.
2. The URI leads to a third party that provides the checksum.
Furthermore, we can distinguish:
A. We trust the checksum to be correct.
B. We do not.
In scenario 2, checksums can stem from a variety of potential sources that may
differ widely depending on the protocol specified by the URI. For example, with
HTTP we may be using a header field called "content-MD5", but for ftp, S3,
HDFS, and so on, this is different.
For some protocols, it is not uncommon to provide a second file that contains
the checksum next to the original, with an extra extension like "<original
URI>.md5". It turns out that we can generalize this approach to a simple
solution that also covers scenario 1A (see phase 1 below).
I propose the following phased approach, with phase 1 as intended MVP for Mesos
0.23.
"Phase 1":
- We add a (string) field "checksum_uri" to the CommandInfo::URI protobuf
message struct.
- The framework user writes a checksum of her choosing into a file at the URI
specified in the above field.
- The fetcher downloads the checksum file every time before fetching the
content file.
- If the checksum is still the same as last time, the content download is
skipped in case a cache file is present. The cached content is used (copied to
the executor sandbox) then.
- If the checksum has changed, the presumably renewed content gets downloaded
into the cache, which gets "refreshed".
- The new checksum is saved by the Fetcher so it can be compared with the
checksum of the same associated URI when it comes up for download again.
- If the checksum_uri field has no value, the fetcher cache acts as currently
implemented, without automatic refresh.
- If there is a checksum_uri value, but downloading the checksum fails, the
Fetcher logs a warning and falls back on not caching at all.
So far this supports every arbitrary checksum algorithm, since it is entirely
in the hands of the framework what checksum algorithm is in use. However, this
only works for scenario 1A and for constellations of scenario 2A that are
congruent with this setup (e.g. MD5 files next to content files).
"Phase 2":
- For scenario 1B and 2B, we provide ways for the fetcher to verify the
checksum.
- This requires checksum algorithm code in the fetcher.
- We pair every newly supported checksum algorithm with an enum value that can
be specified in the CommandInfo::URI message. Initially, this will be MD5 only.
I reckon SHA-1 would be next. Suggestions welcome.
- If checksum verification fails, the Fetcher logs a warning and falls back on
not caching.
"Phase 3":
- The checksum algorithm in Mesos is in a customizable Mesos Module
(http://mesos.apache.org/documentation/latest/modules/).
- It is selected by a string field in the CommandInfo::URI message that
indicates the module name.
> Fetcher cache file verification, updating and invalidation
> ----------------------------------------------------------
>
> Key: MESOS-2073
> URL: https://issues.apache.org/jira/browse/MESOS-2073
> Project: Mesos
> Issue Type: Improvement
> Components: fetcher, slave
> Reporter: Bernd Mathiske
> Assignee: Bernd Mathiske
> Priority: Minor
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> The other tickets in the fetcher cache epic do not necessitate a check sum
> (e.g. MD5, SHA*) for files cached by the fetcher. Whereas such a check sum
> could be used to verify whether the file arrived without unintended
> alterations, it can first and foremost be employed to detect and trigger
> updates.
> Scenario: If a UIR is requested for fetching and the indicated download has
> the same check sum as the cached file, then the cache file will be used and
> the download forgone. If the check sum is different, then fetching proceeds
> and the cached file gets replaced.
> This capability will be indicated by an additional field in the URI protobuf.
> Details TBD, i.e. to be discussed in comments below.
> In addition to the above, even if the check sum is the same, we can support
> voluntary cache file invalidation: a fresh download can be requested, or the
> caching behavior can be revoked entirely.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)