> On Feb. 25, 2015, 2:20 p.m., Till Toenshoff wrote: > > This really is an impressive project Bernd, tipping my hat here! > > > > I did not get all the way through, had to ignore the tests in this review > > (to be continued!). > > > > For playing a bit with this and for testing the functionality, it would be > > great if this was rebased to the current master. > > > > Some high level comments; > > I kind of like the idea of wrapping synchronous calls into async's, thus > > allowing error cases to fall through then-cases and hence enabling an error > > handling that isnt scattered all over. While you are now able to > > concentrate the failure handling, there is a cost involved; 1. it becomes > > harder to read, 2. it possibly becomes more prone to unforseen concurrency > > issues, 3. this launches one-off processes a lot lot lot :).
1. Compared to what alternative? 2. Compared to what alternative? 3. "Launching" one-off processes should not be a problem if we have our thread pool implemented well, which I am assuming. Are you referring to the many async() calls? This stuff is supposed to be cheap and available :-) > On Feb. 25, 2015, 2:20 p.m., Till Toenshoff wrote: > > src/slave/containerizer/fetcher.hpp, line 122 > > <https://reviews.apache.org/r/30774/diff/18/?file=875546#file875546line122> > > > > Why are we exposing the FetchProcess definition within this header? In > > other words, does anyone outside the Fetcher implementation need to know > > about this process' symbols? Yes: tests. > On Feb. 25, 2015, 2:20 p.m., Till Toenshoff wrote: > > src/slave/containerizer/fetcher.cpp, line 1073 > > <https://reviews.apache.org/r/30774/diff/18/?file=875547#file875547line1073> > > > > Why using `auto` here - is this sanctioned by our styleguide? I have 2 ship its for the style guide change. Please commit it :-) https://reviews.apache.org/r/30792/ > On Feb. 25, 2015, 2:20 p.m., Till Toenshoff wrote: > > src/slave/containerizer/fetcher.cpp, line 1084 > > <https://reviews.apache.org/r/30774/diff/18/?file=875547#file875547line1084> > > > > Clang-format seems to prefer not to indend a function name after its > > return value at all. I have never encountered such case before, hence I am > > entirely unsure about the mesos (aka __right__) way. > > > > This appears to be triggered mostly by the fact that you are using > > nested class definitions - I would try to avoid that unless your > > implementation and the definitions are in one place as the resulting > > symbols are really hard to identify on a single glance. If I don't nest them they need longer names. That saves me "::" per class name and leaves more room for interpretation. We are not using Clang-format, so this is a non-issue. (I am convinced we should move to an automatic formatter yesterday, for the record). > On Feb. 25, 2015, 2:20 p.m., Till Toenshoff wrote: > > src/launcher/fetcher.cpp, line 138 > > <https://reviews.apache.org/r/30774/diff/18/?file=875542#file875542line138> > > > > Why are you mentioning `os::net` to the user? Carried this over from old fetcher code. Removed now. > On Feb. 25, 2015, 2:20 p.m., Till Toenshoff wrote: > > src/launcher/fetcher.cpp, line 236 > > <https://reviews.apache.org/r/30774/diff/18/?file=875542#file875542line236> > > > > Not yours but this sounds weird, what do you think? It now says: // We regard as local: "file://" or the absense of any URI scheme. > On Feb. 25, 2015, 2:20 p.m., Till Toenshoff wrote: > > src/launcher/fetcher.cpp, lines 425-426 > > <https://reviews.apache.org/r/30774/diff/18/?file=875542#file875542line425> > > > > Is this part of the comment an artifact of older revisions? The exit > > code does not seem to correlate with the number of fetched items. Yep. I will bring this back later. > On Feb. 25, 2015, 2:20 p.m., Till Toenshoff wrote: > > src/slave/containerizer/fetcher.hpp, lines 19-20 > > <https://reviews.apache.org/r/30774/diff/18/?file=875546#file875546line19> > > > > I know this particular issue has not yet gotten a final vote, but I > > would suggest using > > `__SLAVE_CONTAINERIZER_FETCHER_HPP__` instead. Done. > On Feb. 25, 2015, 2:20 p.m., Till Toenshoff wrote: > > src/slave/containerizer/fetcher.cpp, lines 148-151 > > <https://reviews.apache.org/r/30774/diff/18/?file=875547#file875547line148> > > > > Can we get rid of this validation here or within `validateUri`? > > > > In any case, please make sure the resulting Error messages are > > consistent. Carried over from old fetcher code. Now compressed into one place. :-) > On Feb. 25, 2015, 2:20 p.m., Till Toenshoff wrote: > > src/slave/containerizer/fetcher.cpp, lines 236-239 > > <https://reviews.apache.org/r/30774/diff/18/?file=875547#file875547line236> > > > > Are these URIs canonicalized towards lower case already at some point? > > > > According to the standard, a mixed case URI / scheme is perfectly > > valid: `Http://example.org/example.tar.gz`. > > > > See http://tools.ietf.org/html/std66#section-3.1 again :). I just carried over how it used to be. Wanna file an improvement ticket? > On Feb. 25, 2015, 2:20 p.m., Till Toenshoff wrote: > > src/slave/containerizer/fetcher.cpp, line 358 > > <https://reviews.apache.org/r/30774/diff/18/?file=875547#file875547line358> > > > > Couldnt T get deduced from the async returned container (Try<T>) -- in > > other words, do you __need__ to call `tryToFuture<Nothing>(..)` instead of > > letting the compiler select the right one for you (`tryToFuture(..)`)? My C++ compiler is not smart enough for this and rejects it. > On Feb. 25, 2015, 2:20 p.m., Till Toenshoff wrote: > > src/slave/containerizer/fetcher.cpp, lines 496-502 > > <https://reviews.apache.org/r/30774/diff/18/?file=875547#file875547line496> > > > > How about: > > > > ``` > > .then(defer( > > self(), > > &FetcherProcess::__fetch, > > containerId, > > lambda::_1, > > ... > > ``` > > > > or > > > > ``` > > .then(defer(self(), > > &FetcherProcess::__fetch, > > containerId, > > lambda::_1, > > ... > > ``` > > ? Fixed. > On Feb. 25, 2015, 2:20 p.m., Till Toenshoff wrote: > > src/slave/containerizer/fetcher.cpp, line 527 > > <https://reviews.apache.org/r/30774/diff/18/?file=875547#file875547line527> > > > > Would you mind adding a comment on how could we possibly end up here > > with non existing paths? I cannot find out what line of code you are referring to. RB does not show me a line number or some source code that matches. If you mean how we can end up with a deletion failure, this could be due to hardware failure as reported by a disk driver, for example. > On Feb. 25, 2015, 2:20 p.m., Till Toenshoff wrote: > > src/slave/containerizer/fetcher.cpp, line 809 > > <https://reviews.apache.org/r/30774/diff/18/?file=875547#file875547line809> > > > > This appears to be a use case for `discard` as you are no longer > > interested in the results and, if possible, wish to abort any actions, > > right? > > > > Such discard however certainly needs to be taken care of within your > > entire chain (which currently only addresses `onFailure` but not > > `onDiscard`). The action has already happened here. We are merely reporting now that it has failed. - Bernd ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30774/#review73462 ----------------------------------------------------------- On Feb. 25, 2015, 9:54 a.m., Bernd Mathiske wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/30774/ > ----------------------------------------------------------- > > (Updated Feb. 25, 2015, 9:54 a.m.) > > > Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and > Timothy Chen. > > > Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and > MESOS-2074 > https://issues.apache.org/jira/browse/MESOS-2057 > https://issues.apache.org/jira/browse/MESOS-2069 > https://issues.apache.org/jira/browse/MESOS-2070 > https://issues.apache.org/jira/browse/MESOS-2072 > https://issues.apache.org/jira/browse/MESOS-2073 > https://issues.apache.org/jira/browse/MESOS-2074 > > > Repository: mesos > > > Description > ------- > > Almost all of the functionality in epic MESOS-336. Downloaded files from > CommandInfo::URIs can now be cached in a cache directory designated by a > slave flag. This only happens when asked for by an extra flag in the URI and > is thus backwards-compatible. The cache has a size limit also given by a new > slave flag. Cache-resident files are evicted as necessary to make space for > newly fetched ones. Concurrent attempts to cache the same URI leads to only > one download. The fetcher program remains external for safety reasons, but is > now augmented with more elaborate parameters packed into a JSON object to > implement specific fetch actions for all of the above. Additional testing > includes fetching from (mock) HDFS and coverage of the new features. > > > Diffs > ----- > > docs/fetcher.md PRE-CREATION > include/mesos/fetcher/fetcher.proto > facb87b92bf3194516f636dcc348e136af537721 > include/mesos/mesos.proto 507845c493f65e154214fc7e562206e452990469 > src/Makefile.am d372404d84c9ac0bc49da7407ad366701b9586a6 > src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b > src/launcher/fetcher.cpp 5b2d86d1867b25bf71461fd73df91b089002325b > src/slave/constants.hpp 12d6e92b814076fd423a7738e030dc6ce6954761 > src/slave/containerizer/docker.hpp 70114dcef7a416ae904e95eb9ce365bcd866aebc > src/slave/containerizer/docker.cpp 813ddf8c98622748b2da1b739bf122387abc4c79 > src/slave/containerizer/fetcher.hpp > bfd98dbe16e2bd5df3e2c8e9b10e303654f33446 > src/slave/containerizer/fetcher.cpp > 6e6bce08d76bb8a5813c905e3ffeff9b2411fd6d > src/slave/containerizer/mesos/containerizer.hpp > 074a2d82dcd882e52f8cd62ed7493295596acb26 > src/slave/containerizer/mesos/containerizer.cpp > d5b90d12d63becfeb4c3efa9c6f5d826417f582c > src/slave/flags.hpp ddb32593566b71bcf0b66adeeb889e43ef911084 > src/slave/slave.cpp ec7ec1356e745bb07484ae1755c9183b038043b3 > src/tests/docker_containerizer_tests.cpp > 8b212d4e6ed5a179ebadce1bdbbf3edde87d706c > src/tests/fetcher_cache_tests.cpp PRE-CREATION > src/tests/fetcher_tests.cpp fcbf7ad912f0b1b3d106a75a9dcb8213a0c69c07 > src/tests/mesos.hpp 60c70043c8266a422ffa03ab5a949da0bc822124 > src/tests/mesos.cpp d76ed8c837da162d782de84e935fe8e11e01d6b0 > src/tests/mock_hadoop.sh PRE-CREATION > > Diff: https://reviews.apache.org/r/30774/diff/ > > > Testing > ------- > > make check > > --- longer Description: --- > > -Replaces all other reviews for the fetcher cache except those related to > stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, > 30618, 30621, 30626. See descriptions of those. In dependency order: > > 30033: Removes the fetcher env tests since these won't be needed any more > when the fetcher uses JSON in a single env var as a parameter. They never > tested anything that won't be covered by other tests anyway. > > 30034: Makes the code structure of all fetcher tests the same. Instead of > calling the run method of the fetcher directly, calling through fetch(). Also > removes all uses of I/O redirection, which is not really needed for > debugging, and thus the next patch can refactor fetch() and run(). (The > latter comes in two varieties, which complicates matters without much > benefit.) > > 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field > that will later cause fetcher cache actions. Also introduces the notion of a > cache directory to the fetcher info protobuf. And then propagates these > additions throughout the rest of the code base where applicable. This > includes passing the slave ID all the way down to the place where the cache > dir name is constructed. > > 30037: Extends the fetcher info protobuf with "actions" (fetch directly > bypassing the cache, fetch through the cache, retrieve from the cache). > Switches the basis for dealing with uris to "items", which contain the uri, > the action, and potentially a cache file name. Refactors fetch() and run(), > so there is only one of each. Introduces about half of the actual cache > logic, including a hashmap of cache file objects for bookkeeping and basic > operations on it. > > 30039: Enables fetcher cache actions in the mesos fetcher program. > > 30006: Enables concurrent downloading into the fetcher cache. Reuse of > download results in the cache when multiple fetcher runs occur concurrently. > > 30614: This is to ensure that all this refactoring of fetcher code has not > broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos > and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client > written in bash that acts just like a real "hadoop" command if used in the > right limited way. > > 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. > This implements recovery in an acceptable, yet most simple way. > > 30173: Created fetcher cache tests. Adds a new test source file containing a > test fixture and tests to find out if the fetcher cache works with a variety > of settings. > > 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string > that contains the file size for the URI passed as argument. This is needed to > determine the size of a file on HDFS before downloading it to the fetcher > cache (to ensure there is enough space). > > 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type > separation code (distinguishes http, hdfs, local copying, etc.) from > mesos-fetcher to the fetcher process/actor, since it is going to be reused by > download size queries when we introduce fetcher cache management. Also > factored out URI validation, which will be used the same way by mesos-fetcher > and the fetcher process/actor. > > 30626: Fetcher cache eviction. This happens when the cache does not have > enough space to accomodate upcoming downloads to the cache. Necessary > provisions included here: > - mesos-fetcher does not run until evictions have been successful > - Cache space is reserved while (async) waiting for eviction to succeed. If > it fails, the reservation gets undone. > - Reservations can be partly from available space, partly from evictions. All > math included :-) > - To find out how much space is needed, downloading has a prelude in which we > query the download size from the URI. This works for all URI types that > mesos-fetcher currently supports, including http and hdfs. > - Size-determination requests are now synchronized, too. Only one per URI in > play happens. > - There is cleanup code for all kinds of error situations. At the very end of > the fetch attempt, each list is processed for undoing things like space > reservations and eviction disabling. > - Eviction gets disabled for URIs that are currently in use, i.e. the related > cache files are. We use reference counting for this, since there may be > concurrent fetch attempts using the same cache files. > > > Thanks, > > Bernd Mathiske > >