> On Feb. 8, 2015, 2:06 p.m., Benjamin Hindman wrote:
> > src/tests/fetcher_cache_tests.cpp, line 438
> > <https://reviews.apache.org/r/30774/diff/1/?file=857937#file857937line438>
> >
> >     There is a ton of state, expectations, control flow requirements, etc, 
> > throughout these tests. This makes them very hard to follow, and thus hard 
> > to understand if you're really testing the things you want to test (i.e., 
> > concurrent requests, etc). While it's probably more code, I'd rather have 
> > seen each test case testing exactly what you wanted to test rather than a 
> > bunch of functions pulled out to cover "commonality". It makes it very 
> > difficult for me to determine if certain test situations are covered!
> >     
> >     For example, in this function you await tasks if in "concurrent" mode. 
> > Yet, I'm pretty sure tasks aren't going to run in parallel unless there are 
> > enough resources to do so, and tracking down the size of the tasks and how 
> > many resources were launched with the slave is so non-local that it 
> > requires looking all over this source file and after doing so I'm not 
> > convinced that anything can actually run in parallel since the default 
> > slave has 2 CPUs and each task takes 1 CPU (so maybe 2 tasks run in 
> > parallel but not all of them?).
> >     
> >     Why not be more explicit in the tests, even if it's more code, so that 
> > we're convinced we're testing the things we want to test? My guess is that 
> > there will still be things we can pull out and abstract after having done 
> > that.
> 
> Bernd Mathiske wrote:
>

Indeed, we are not testing everything that needs to be tested yet. More tests 
will be added. 

Each test does state in a comment atop the test method what it is exactly that 
it tests. I have removed some the commonality functions and put their content 
into Setup() and TearDown(). What remains is "executeTasks()" and 
"await<SuchAndSuch>()". Inlining these would not make anything more readable 
IMHO.

I think that in the latest update I have now established that the code in 
_fetch() that is responsible for sorting out concurrent fetching is 
well-exercised. (See RR update message and new comments in the code.) Hardware 
parallelism is not relevant here, nor is that any tasks execute in parallel. In 
fact, I am trying hard to let no task run at all before fetching has become 
concurrent.

There are always enough resources to run "concurrently" in the sense needed 
here. See Setup() and CommandInfo usage. We never use them up.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review71575
-----------------------------------------------------------


On Feb. 17, 2015, 1:35 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated Feb. 17, 2015, 1:35 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and 
> Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from 
> CommandInfo::URIs can now be cached in a cache directory designated by a 
> slave flag. This only happens when asked for by an extra flag in the URI and 
> is thus backwards-compatible. The cache has a size limit also given by a new 
> slave flag. Cache-resident files are evicted as necessary to make space for 
> newly fetched ones. Concurrent attempts to cache the same URI leads to only 
> one download. The fetcher program remains external for safety reasons, but is 
> now augmented with more elaborate parameters packed into a JSON object to 
> implement specific fetch actions for all of the above. Additional testing 
> includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   include/mesos/fetcher/fetcher.proto 
> facb87b92bf3194516f636dcc348e136af537721 
>   include/mesos/mesos.proto 3a2921dff856d37455593dcbf7340aa537997d35 
>   src/Makefile.am 0fe4809542a7d23785a221901947771320b43d52 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 5b2d86d1867b25bf71461fd73df91b089002325b 
>   src/slave/constants.hpp 7717abd4f9a3bd6fdca6af2364864e374ce2e056 
>   src/slave/containerizer/docker.hpp 70114dcef7a416ae904e95eb9ce365bcd866aebc 
>   src/slave/containerizer/docker.cpp 813ddf8c98622748b2da1b739bf122387abc4c79 
>   src/slave/containerizer/fetcher.hpp 
> bfd98dbe16e2bd5df3e2c8e9b10e303654f33446 
>   src/slave/containerizer/fetcher.cpp 
> 6e6bce08d76bb8a5813c905e3ffeff9b2411fd6d 
>   src/slave/containerizer/mesos/containerizer.hpp 
> b3aafe4efa9f3469d7a3fd39243ad66b46d6a54d 
>   src/slave/containerizer/mesos/containerizer.cpp 
> fa40d47aee7803833bcde6cce1e86a21d7cf27d0 
>   src/slave/flags.hpp f6033355d129f0013d39dd053455c936596bf159 
>   src/slave/slave.cpp fff2d725fe49eee984d9151cfb2131202c47994f 
>   src/tests/docker_containerizer_tests.cpp 
> 8b212d4e6ed5a179ebadce1bdbbf3edde87d706c 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp fcbf7ad912f0b1b3d106a75a9dcb8213a0c69c07 
>   src/tests/mesos.hpp 83a369968ab2403fa341829ac5d11f7243095190 
>   src/tests/mesos.cpp 21a405366f56c963611324076efe775f85b9d9f7 
>   src/tests/mock_hadoop.sh PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to 
> stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 
> 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more 
> when the fetcher uses JSON in a single env var as a parameter. They never 
> tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of 
> calling the run method of the fetcher directly, calling through fetch(). Also 
> removes all uses of I/O redirection, which is not really needed for 
> debugging, and thus the next patch can refactor fetch() and run(). (The 
> latter comes in two varieties, which complicates matters without much 
> benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field 
> that will later cause fetcher cache actions. Also introduces the notion of a 
> cache directory to the fetcher info protobuf. And then propagates these 
> additions throughout the rest of the code base where applicable. This 
> includes passing the slave ID all the way down to the place where the cache 
> dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly 
> bypassing the cache, fetch through the cache, retrieve from the cache). 
> Switches the basis for dealing with uris to "items", which contain the uri, 
> the action, and potentially a cache file name. Refactors fetch() and run(), 
> so there is only one of each. Introduces about half of the actual cache 
> logic, including a hashmap of cache file objects for bookkeeping and basic 
> operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of 
> download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not 
> broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos 
> and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client 
> written in bash that acts just like a real "hadoop" command if used in the 
> right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. 
> This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a 
> test fixture and tests to find out if the fetcher cache works with a variety 
> of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string 
> that contains the file size for the URI passed as argument. This is needed to 
> determine the size of a file on HDFS before downloading it to the fetcher 
> cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type 
> separation code (distinguishes http, hdfs, local copying, etc.) from 
> mesos-fetcher to the fetcher process/actor, since it is going to be reused by 
> download size queries when we introduce fetcher cache management. Also 
> factored out URI validation, which will be used the same way by mesos-fetcher 
> and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have 
> enough space to accomodate upcoming downloads to the cache. Necessary 
> provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If 
> it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All 
> math included :-)
> - To find out how much space is needed, downloading has a prelude in which we 
> query the download size from the URI. This works for all URI types that 
> mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in 
> play happens.
> - There is cleanup code for all kinds of error situations. At the very end of 
> the fetch attempt, each list is processed for undoing things like space 
> reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related 
> cache files are. We use reference counting for this, since there may be 
> concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>

Reply via email to