> On Feb. 8, 2015, 2:06 p.m., Benjamin Hindman wrote: > > src/launcher/fetcher.cpp, line 388 > > <https://reviews.apache.org/r/30774/diff/1/?file=857926#file857926line388> > > > > Kill '} else {'.
That would be wrong. Then the CHECK would fire every time. > On Feb. 8, 2015, 2:06 p.m., Benjamin Hindman wrote: > > src/slave/containerizer/fetcher.cpp, line 75 > > <https://reviews.apache.org/r/30774/diff/1/?file=857931#file857931line75> > > > > This error message is a bit little wierd since it's not that it can't > > be removed, it's that we can determine it's 'realpath'. Also, why not use > > that for the actual os::rmdir below? This was accidental merge mess when I combined and rebased patches. Sorry! Cleaned up now. > On Feb. 8, 2015, 2:06 p.m., Benjamin Hindman wrote: > > src/slave/containerizer/fetcher.cpp, line 323 > > <https://reviews.apache.org/r/30774/diff/1/?file=857931#file857931line323> > > > > Were you explicitly trying to do this asynchronously? Since it could be > > an expensive operation (filesystem) I'm totally for it but just wanted to > > confirm. Yes, any file operation needs to be async! I have sometimes encountered bad blocks on hard disks and experienced that when that happens any file operation that touches one may stall for minutes. Besides, we could be using a network-attached/mounted file system with arbitrarily long timeouts. Etc. > On Feb. 8, 2015, 2:06 p.m., Benjamin Hindman wrote: > > src/tests/fetcher_cache_tests.cpp, line 220 > > <https://reviews.apache.org/r/30774/diff/1/?file=857937#file857937line220> > > > > This should occur in the HttpServer but then you should have an > > HttpServerProcess. When does process terminate? How are you sure that > > libprocess is not calling something like 'visit' after the memory for this > > class has been cleaned up!!! I added this comment to the class to clarify what the intent is here: // A minimal HTTP server, not intended as an actor, just reusing what // is already implemented somewhere to serve some HTTP requests for // file downloads, plus counting how many requests are made. There is an explicit terminate() call before Shutdown() in every test that uses this. I will add this to the fixture when creating one for this kind of test. BTW, this is similar to this stretch of code in process_tests.cpp: class FileServer : public Process<FileServer> { public: explicit FileServer(const string& _path) : path(_path) {} virtual void initialize() { provide("", path); } const string path; }; ... FileServer server(path); PID<FileServer> pid = spawn(server); ... terminate(server); wait(server); --- Should I add wait() after terminate() in my tests? > On Feb. 8, 2015, 2:06 p.m., Benjamin Hindman wrote: > > src/tests/fetcher_cache_tests.cpp, line 242 > > <https://reviews.apache.org/r/30774/diff/1/?file=857937#file857937line242> > > > > I don't like the mixture of actors and threads here. Even if this is > > only in the tests, it sets a bad precedent that we're willing compromise on > > the integrity of the assumptions you have when executing within an actor. > > > > There is also a lot of expectations shared between the implementation > > here and the tests themselves. For example, 'pause' is meant to be used in > > order to enable queuing up multiple pending cache entry downloads. > > > > The abstraction I think you want here is something that lets you > > control what happens whenever we get a 'visit' callback, and to do that we > > really need to mock 'visit'. If you had a mocked 'visit' then you can > > explicitly control "how", and most importantly, "when" each 'visit' call > > occurs. The HttpServer here is not an "actor". I am just reusing some code to produce an HttpServer as quickly as possible. I think the problem I did miss is to establish that tasks have in fact queued up. Rather than just observing this when debugging. I may still need pausing the HTTP request handling then. Or at least some HTTP server to test the net::-related code paths in mesos-fetcher. I don't see how a mocked method would have more control over when the fetcher does anything than an overridden method. > On Feb. 8, 2015, 2:06 p.m., Benjamin Hindman wrote: > > src/tests/fetcher_cache_tests.cpp, line 438 > > <https://reviews.apache.org/r/30774/diff/1/?file=857937#file857937line438> > > > > There is a ton of state, expectations, control flow requirements, etc, > > throughout these tests. This makes them very hard to follow, and thus hard > > to understand if you're really testing the things you want to test (i.e., > > concurrent requests, etc). While it's probably more code, I'd rather have > > seen each test case testing exactly what you wanted to test rather than a > > bunch of functions pulled out to cover "commonality". It makes it very > > difficult for me to determine if certain test situations are covered! > > > > For example, in this function you await tasks if in "concurrent" mode. > > Yet, I'm pretty sure tasks aren't going to run in parallel unless there are > > enough resources to do so, and tracking down the size of the tasks and how > > many resources were launched with the slave is so non-local that it > > requires looking all over this source file and after doing so I'm not > > convinced that anything can actually run in parallel since the default > > slave has 2 CPUs and each task takes 1 CPU (so maybe 2 tasks run in > > parallel but not all of them?). > > > > Why not be more explicit in the tests, even if it's more code, so that > > we're convinced we're testing the things we want to test? My guess is that > > there will still be things we can pull out and abstract after having done > > that. > On Feb. 8, 2015, 2:06 p.m., Benjamin Hindman wrote: > > src/tests/fetcher_cache_tests.cpp, lines 132-135 > > <https://reviews.apache.org/r/30774/diff/1/?file=857937#file857937line132> > > > > There is only "checkpointing" mode now, so don't bother trying to test > > both cases. I tried removing this distinction and tests failed, both the recovery test and the others that don't test it. Keeping the distinction for now. - Bernd ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30774/#review71575 ----------------------------------------------------------- On Feb. 8, 2015, 11:34 a.m., Bernd Mathiske wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/30774/ > ----------------------------------------------------------- > > (Updated Feb. 8, 2015, 11:34 a.m.) > > > Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and > Timothy Chen. > > > Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, and MESOS-2074 > https://issues.apache.org/jira/browse/MESOS-2057 > https://issues.apache.org/jira/browse/MESOS-2069 > https://issues.apache.org/jira/browse/MESOS-2070 > https://issues.apache.org/jira/browse/MESOS-2072 > https://issues.apache.org/jira/browse/MESOS-2074 > > > Repository: mesos > > > Description > ------- > > Replaces all other reviews for the fetcher cache except those related to > stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, > 30618, 30621, 30626. See descriptions of those. > > > Diffs > ----- > > include/mesos/fetcher/fetcher.proto > facb87b92bf3194516f636dcc348e136af537721 > include/mesos/mesos.proto 3a2921dff856d37455593dcbf7340aa537997d35 > src/Makefile.am 0fe4809542a7d23785a221901947771320b43d52 > src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b > src/launcher/fetcher.cpp 5b2d86d1867b25bf71461fd73df91b089002325b > src/slave/constants.hpp 7717abd4f9a3bd6fdca6af2364864e374ce2e056 > src/slave/containerizer/docker.hpp 70114dcef7a416ae904e95eb9ce365bcd866aebc > src/slave/containerizer/docker.cpp 813ddf8c98622748b2da1b739bf122387abc4c79 > src/slave/containerizer/fetcher.hpp > bfd98dbe16e2bd5df3e2c8e9b10e303654f33446 > src/slave/containerizer/fetcher.cpp > 6e6bce08d76bb8a5813c905e3ffeff9b2411fd6d > src/slave/containerizer/mesos/containerizer.hpp > b3aafe4efa9f3469d7a3fd39243ad66b46d6a54d > src/slave/containerizer/mesos/containerizer.cpp > fa40d47aee7803833bcde6cce1e86a21d7cf27d0 > src/slave/flags.hpp f6033355d129f0013d39dd053455c936596bf159 > src/slave/slave.cpp fff2d725fe49eee984d9151cfb2131202c47994f > src/tests/docker_containerizer_tests.cpp > 8b212d4e6ed5a179ebadce1bdbbf3edde87d706c > src/tests/fetcher_cache_tests.cpp PRE-CREATION > src/tests/fetcher_tests.cpp fcbf7ad912f0b1b3d106a75a9dcb8213a0c69c07 > src/tests/mesos.hpp 83a369968ab2403fa341829ac5d11f7243095190 > src/tests/mesos.cpp 21a405366f56c963611324076efe775f85b9d9f7 > src/tests/mock_hadoop.sh PRE-CREATION > > Diff: https://reviews.apache.org/r/30774/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Bernd Mathiske > >