Meeting notes about reservations
Hi, BenM, BenH, MPark and I had a sync today discussing reservations (dynamic+static) in Mesos. It was a very productive meeting and here are some notes I took from the meeting. I think all of us are on the same page now and are happy with the current design. - Jie 1) we still keep the static reservation, but we don't introduce a new reservation type for that. If role = R and reservation is none, then it's statically reserved (cannot be released by anyone). 2) /reserve and /unreserve endpoints. Specify the role, slaveid (or hostname, or both, and we validate), resources when reserving resources. It only takes resources from static * resources. The operator can reserve both OP and FW reservations. When hit the reserve endpoint, we use the offered resources and invoke the offer rescind mechanism. Question: should we release dynamic reservations when a framework is shutting down? Multiple frameworks might share the same role! 3) reservation id We might want to have a reservation id (the id can be the framework id as well). The goal is to distinguish who reserved this resources if two frameworks are under the same role. 4) introduce principal for reservation The idea is that we need to track who reserved a resource so that we can decide who can unreserve it. The principal and ACLs can help us setup those rules for deciding who can unreserve what. 5) protobuf message Resources { ... message Reservation { optional string principal; optional string reservation_id; // introduce it later maybe } ... optional string role [ Default = * ]; optional Reservation reservation; }
Re: Review Request 31000: stout: Moved MAC and IP classes to stout/mac.hpp and stout/ip.hpp
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31000/#review73367 --- Thanks! This is great! Can we commit this first? (and libprocess and mesos ones) - Jie Yu On Feb. 21, 2015, 12:02 a.m., Evelina Dumitrescu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31000/ --- (Updated Feb. 21, 2015, 12:02 a.m.) Review request for mesos and Dominic Hamon. Repository: mesos Description --- see summary Diffs - 3rdparty/libprocess/3rdparty/stout/include/stout/ip.hpp PRE-CREATION 3rdparty/libprocess/3rdparty/stout/include/stout/mac.hpp PRE-CREATION 3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp b464e517bb1e7b6b381c6cd6c0466ed788a82615 3rdparty/libprocess/3rdparty/stout/tests/ip_tests.cpp PRE-CREATION 3rdparty/libprocess/3rdparty/stout/tests/mac_tests.cpp PRE-CREATION Diff: https://reviews.apache.org/r/31000/diff/ Testing --- Thanks, Evelina Dumitrescu
Re: Help us review #MesosCon 2015 proposals
+1 It does not allow me to continue to see the second page if I don't make choices on the first one. - Jie On Fri, Feb 20, 2015 at 4:54 PM, Benjamin Mahler benjamin.mah...@gmail.com wrote: Great to see so many proposals! Is it intentional that we have to review them in small subsets? It's hard to tell what to consider as an Average proposal when you can only see a small subset at a time. Just curious on the reasoning behind that. On Wed, Feb 18, 2015 at 2:44 PM, Dave Lester d...@davelester.org wrote: A total of 63 proposals were submitted for #MesosCon[1], up significantly from 24 submitted for last year’s conference. Similar to last year, the MesosCon program committee is opening these proposals up for community review/feedback to better-inform our decisions about what should be included in the program. In order to make it easier to review a subset of the proposals, we’ve segmented them based upon three loose themes: Frameworks, Users / Ops, and Mesos Internals and Extensions. We encourage you to review proposals based upon one theme, or all three! *Frameworks (18 Proposals):* bit.ly/MesosCon2015Frameworks Talks on how frameworks can be used, developed, and integrate with Mesos. *Users / Ops (28 Proposals):* bit.ly/MesosCon2015UsersOps A combination of talks that are use cases (how company x uses Mesos), and operations-focused (how we deploy x, use Docker, etc). *Mesos Internals and Extensions (17 Proposals):* bit.ly/MesosCon2015InternalsExt Features of the Mesos core, or software integrations with the internals of Mesos. Some proposals have overlap with frameworks and ops, but most are focused on the foundational aspects of how Mesos works. The forms above also include an opportunity to indicate which sessions you didn't see proposed but would like to attend. Thanks in advance for your participation! The forms will close on March 4th 2015, two weeks from today. Dave Links: 1. http://mesoscon.org
Re: Review Request 30911: Updated the filter abstraction for Resources.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30911/#review73294 --- You may wanna do a rebase. LGTM, will give a shipit once you updated it. Thanks Mpark! include/mesos/resources.hpp https://reviews.apache.org/r/30911/#comment119561 We don't use snake_case for function names. Please ``` s/persitent_volumes/persistentVolumes ``` src/master/allocator/mesos/hierarchical.hpp https://reviews.apache.org/r/30911/#comment119562 Can we punt on that change right now? It does not seem to simply the code and it's actually more hard to read. src/master/master.hpp https://reviews.apache.org/r/30911/#comment119563 s/isCheckpoint/needCheckpointing/ ? src/slave/slave.cpp https://reviews.apache.org/r/30911/#comment119564 Can we simply the logic here to be: ``` Resources resources = task.resources(); if (task.has_executor()) { resources += task.executor().resources(); } CHECK(checkpointedResources.contains(resources.persistentVolumes()) ...; ``` - Jie Yu On Feb. 19, 2015, 2:54 a.m., Michael Park wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30911/ --- (Updated Feb. 19, 2015, 2:54 a.m.) Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Jie Yu, and Vinod Kone. Bugs: MESOS-2348 https://issues.apache.org/jira/browse/MESOS-2348 Repository: mesos Description --- See [JIRA Ticket](https://issues.apache.org/jira/browse/MESOS-2348). Diffs - include/mesos/resources.hpp c7cc46e0183ea97013dd088a717da6c0e6ed5cf0 src/common/resources.cpp 98371f6873482d0cdbefeb279b58ae6cc680a88f src/master/allocator/mesos/hierarchical.hpp 2680d6231927867d5a8d75cbc42b81d6c75fc7f2 src/master/master.hpp 6a39df04514c756415354fae66c5835ada191c52 src/master/validation.cpp acc35b25c93f2d3900d79c3070b1d681416ed66b src/slave/slave.cpp ec7ec1356e745bb07484ae1755c9183b038043b3 src/tests/hierarchical_allocator_tests.cpp eeecfb64540b1915074aaffaa5d506b203bc src/tests/resources_tests.cpp 3f98782fd437dba808d720bf8e9b94b8fa7e0feb Diff: https://reviews.apache.org/r/30911/diff/ Testing --- make check Thanks, Michael Park
Re: Review Request 30911: Updated the filter abstraction for Resources.
On Feb. 20, 2015, 6:30 p.m., Jie Yu wrote: include/mesos/resources.hpp, lines 148-150 https://reviews.apache.org/r/30911/diff/3/?file=868705#file868705line148 We don't use snake_case for function names. Please ``` s/persitent_volumes/persistentVolumes ``` I am a little hesitate if we want to introduce this helper or not. In many cases, the code is much simpler if we do ``` foreach (const Resource resource, task.resources()) { if (!resource.isPersistentVolume()) { continue; } ... } ``` Also, looks like the following code is not too bad as well. ``` resources.filter(Resources::isPersistentVolume) ``` What do you think? - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30911/#review73294 --- On Feb. 19, 2015, 2:54 a.m., Michael Park wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30911/ --- (Updated Feb. 19, 2015, 2:54 a.m.) Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Jie Yu, and Vinod Kone. Bugs: MESOS-2348 https://issues.apache.org/jira/browse/MESOS-2348 Repository: mesos Description --- See [JIRA Ticket](https://issues.apache.org/jira/browse/MESOS-2348). Diffs - include/mesos/resources.hpp c7cc46e0183ea97013dd088a717da6c0e6ed5cf0 src/common/resources.cpp 98371f6873482d0cdbefeb279b58ae6cc680a88f src/master/allocator/mesos/hierarchical.hpp 2680d6231927867d5a8d75cbc42b81d6c75fc7f2 src/master/master.hpp 6a39df04514c756415354fae66c5835ada191c52 src/master/validation.cpp acc35b25c93f2d3900d79c3070b1d681416ed66b src/slave/slave.cpp ec7ec1356e745bb07484ae1755c9183b038043b3 src/tests/hierarchical_allocator_tests.cpp eeecfb64540b1915074aaffaa5d506b203bc src/tests/resources_tests.cpp 3f98782fd437dba808d720bf8e9b94b8fa7e0feb Diff: https://reviews.apache.org/r/30911/diff/ Testing --- make check Thanks, Michael Park
Re: Review Request 30545: cgroups: added support to listen on memory pressures.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30545/#review73359 --- The code looks great! Given that the tests are quite large. Can you split this patch and I'll give a shipit for the code. For tests, I need to do a pass next week. Thanks! THis is great, Chi! src/linux/cgroups.hpp https://reviews.apache.org/r/30545/#comment119660 Any reason this is not a static method of pressure::Listener? src/linux/cgroups.hpp https://reviews.apache.org/r/30545/#comment119661 Well, you don't need this friend declaration if 'create' is a static member funciton, right? src/linux/cgroups.cpp https://reviews.apache.org/r/30545/#comment119664 Look like only one place is using this constant. Sugguest killing it. src/linux/cgroups.cpp https://reviews.apache.org/r/30545/#comment119663 I think we indent 2 spaces for switch 'case'. src/linux/cgroups.cpp https://reviews.apache.org/r/30545/#comment119662 I think here should be able to do: ``` : coutner_(0), error(None()), process(new event::Listener( hierarchy, cgroup, memory.pressure_level, stringify(level))) {} ``` src/linux/cgroups.cpp https://reviews.apache.org/r/30545/#comment119668 Remove the tailing period. src/linux/cgroups.cpp https://reviews.apache.org/r/30545/#comment119669 We don't use std:: prefix in cpp files. Please remove them (do a sweep in this file). - Jie Yu On Feb. 20, 2015, 9:37 p.m., Chi Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30545/ --- (Updated Feb. 20, 2015, 9:37 p.m.) Review request for mesos, Dominic Hamon, Ian Downes, and Jie Yu. Bugs: MESOS-2136 https://issues.apache.org/jira/browse/MESOS-2136 Repository: mesos Description --- cgroups: added support to listen on memory pressures. Diffs - src/Makefile.am d372404d84c9ac0bc49da7407ad366701b9586a6 src/linux/cgroups.hpp e07772fec11b9bb73a44c54326f24d7ee1e8a64b src/linux/cgroups.cpp a307e27c5840270e28cced2d5aeb90c6679bff1d src/tests/cgroups_tests.cpp 9d50a47c939f5f136c85fdcc555459512c6f2259 src/tests/memory_test_helper.hpp PRE-CREATION src/tests/memory_test_helper.cpp PRE-CREATION src/tests/memory_test_helper_main.cpp PRE-CREATION Diff: https://reviews.apache.org/r/30545/diff/ Testing --- Thanks, Chi Zhang
Re: Review Request 29288: stout: Created IP address abstraction for different protocol families
On Feb. 18, 2015, 1:28 a.m., Jie Yu wrote: Glad to see this being worked on, Thanks for all the effort! This is a partial review. Let's address the following comments first and then I'll take a second pass. One high level comment is that we should separate this giant patch. Many changes are being mixed together, making it very hard for reviewers to digest. For example, you renamed `ip` to `IP::fromLinkDevice`. I think that can be pulled into a separate patch, right? Also, let's try not to introduce helper functions that are not currently being used. You can leave a TODO there. That can also bring this patch smaller. For example, you don't have to add the implementation for IPv6 case in `IP::fromLinkDevice` because it's not being used yet. You can always leave a TODO and fill the implementation later in a separate patch. That'll make the reviewer's life much better. I would suggest we first refactor `net::IP` and separate the concepts of IP address and IP network (see my comments below). No need to introduce the concept of family (and all IPv6 stuffs) yet in this patch. You need to update all the callsites that use `net::IP` (e.g., the port mapping isolator might want to take `net::IPNetwork` instead of `net::IP` in some cases). Let's start with this first! And then, in the second patch, you can make `net::IP` IPv6 capable. That means you need to store a union. Simply add TODO in those helper functions that you think you need to fill IPv6 implementaion in. Let's try to keep that patch small as well. Then, we'll see what we go from there. Does that sound good to you? Evelina Dumitrescu wrote: I appreciate a lot that you took a look at my patches! If we splitt the patch into IP and IPNetwork, then the netmask from the IPNetwork class should be optional. A situation where we need the net mask as optional is fromLinkDevice. We will end up with the IP and IPNetwork classes representing the same thing. I agree with splitting this patch and introduce the changes step by step. Can you expand on why fromLinkDevice needs netmask to be optional? Looking at the original logics, we only create a IPNetwork if both address and netmask exist. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29288/#review72771 --- On Feb. 12, 2015, 8:05 p.m., Evelina Dumitrescu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29288/ --- (Updated Feb. 12, 2015, 8:05 p.m.) Review request for mesos, Benjamin Hindman, Dominic Hamon, Jie Yu, Joris Van Remoortere, and Niklas Nielsen. Bugs: MESOS-1919 https://issues.apache.org/jira/browse/MESOS-1919 Repository: mesos Description --- Created the inner class InAddrStorage encapsulated inside the IP class. The class uses a union with the in_addr and in6_addr fields. I considered that the The MasterInfo protobuffers should have both an ipv4 and an ipv6 field. I intend to use the same Classifiers, addition, removal and update of container filters, but write different encode/decode functions for IPv4/ICMP and IPv6/ICMPv6 because the processing of the protocol headers differ. Diffs - 3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp a0210ea6440086246aafe632f86498abbb70719a 3rdparty/libprocess/3rdparty/stout/tests/net_tests.cpp 425132e5d7c3770be4a5a39feea5a2f22179b871 Diff: https://reviews.apache.org/r/29288/diff/ Testing --- make check Thanks, Evelina Dumitrescu
Re: Review Request 30545: cgroups: added support to listen on memory pressures.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30545/#review72971 --- First pass, haven't looked at the tests yet. The main issue is that Listener is not thread safe. You probably need to create a libprocess. src/linux/cgroups.hpp https://reviews.apache.org/r/30545/#comment119089 We usually put the comments above the forward declaration: ``` // Forward declaration. class Listener; ``` src/linux/cgroups.hpp https://reviews.apache.org/r/30545/#comment119090 +1 on Dominic's comment. Let's create a enum and simply expose the static method 'create' in pressure::Listener. You can kill the forward declaration as well. ``` enum Level { LOW, MEDIUM, CRITICAL }; TryOwnedListener pressure::Listener::create( const string hierarchy, const string cgroup, Level level); ``` src/linux/cgroups.hpp https://reviews.apache.org/r/30545/#comment119092 Remove On a single error..., and replace it with Please see comments of 'counter' for details. src/linux/cgroups.hpp https://reviews.apache.org/r/30545/#comment119099 What's the concurrency story here? Looks like this class is not thread safe? Some thread is reading counter/error while other thread can modify counter/error. I guess you still need a process here. src/linux/cgroups.hpp https://reviews.apache.org/r/30545/#comment119091 Returns a failure if any error occurs while monitoring the pressure events, and any subsequent calls to 'counter' will return the same failure. In that case, the user should consider create a new Listener. src/linux/cgroups.hpp https://reviews.apache.org/r/30545/#comment119093 See my comments above, please expose this function src/linux/cgroups.hpp https://reviews.apache.org/r/30545/#comment119095 s/read/listen/? Also, please use const ref for 'future' ``` const Futureuint64_t future ``` src/linux/cgroups.hpp https://reviews.apache.org/r/30545/#comment119098 Let's not conflate 'counter' and 'error'. Use two member fields: ``` OptionError error; uint64_t counter_; (initialize it to 0) ``` src/linux/cgroups.cpp https://reviews.apache.org/r/30545/#comment119094 Kill those functions. src/linux/cgroups.cpp https://reviews.apache.org/r/30545/#comment119096 s/read/listen src/linux/cgroups.cpp https://reviews.apache.org/r/30545/#comment119097 s/read/listen/ - Jie Yu On Feb. 17, 2015, 8:12 p.m., Chi Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30545/ --- (Updated Feb. 17, 2015, 8:12 p.m.) Review request for mesos, Dominic Hamon, Ian Downes, and Jie Yu. Bugs: MESOS-2136 https://issues.apache.org/jira/browse/MESOS-2136 Repository: mesos Description --- cgroups: added support to listen on memory pressures. Diffs - src/Makefile.am d372404d84c9ac0bc49da7407ad366701b9586a6 src/linux/cgroups.hpp bf8efee05b995a37d4e6e7679a493b2d5238aa0b src/linux/cgroups.cpp b6f75b14dea7609b90627eccdd33ef891ed9d21f src/tests/cgroups_tests.cpp 9d50a47c939f5f136c85fdcc555459512c6f2259 src/tests/memory_test_helper.hpp PRE-CREATION src/tests/memory_test_helper.cpp PRE-CREATION src/tests/memory_test_helper_main.cpp PRE-CREATION Diff: https://reviews.apache.org/r/30545/diff/ Testing --- Thanks, Chi Zhang
Re: Review Request 31024: Changed the slave logic so that it always waits for containerizer update to finish.
On Feb. 18, 2015, 6:53 p.m., Vinod Kone wrote: src/slave/slave.hpp, line 280 https://reviews.apache.org/r/31024/diff/1-3/?file=863795#file863795line280 s/flushTasks/runTasks/ so that we don't add a new verb flush? We already have 'runTask'. Sounds a little confusing if we have both 'runTask' and 'runTasks'. Because of that, I still prefer 'flushTasks'. What do you think? - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31024/#review72975 --- On Feb. 18, 2015, 12:01 a.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31024/ --- (Updated Feb. 18, 2015, 12:01 a.m.) Review request for mesos, Ian Downes and Vinod Kone. Bugs: MESOS-998 https://issues.apache.org/jira/browse/MESOS-998 Repository: mesos Description --- Changed the slave logic so that it always waits for containerizer update to finish. The key idea is to leverage `executor-queuedTasks` when containerizer-update is in progress. Diffs - include/mesos/type_utils.hpp 32dc6ed489f43ba3695507f59c4c2d94028c8df1 src/slave/slave.hpp 7a399f6df50c69b7e1e12d74f076fa57b6edb1b3 src/slave/slave.cpp ec7ec1356e745bb07484ae1755c9183b038043b3 src/tests/master_tests.cpp e69a7fb39e93a9a999e0991985f5d060b72aac65 src/tests/registrar_zookeeper_tests.cpp 30d3cd8b1e7d56fbb326c9feccd4c3cbae2015de src/tests/scheduler_tests.cpp 080ec7236246794de6c8cc7356a06331c5e48cd5 src/tests/slave_tests.cpp 21709e2858d91a2746e0343786f73dec22735da6 Diff: https://reviews.apache.org/r/31024/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 31162: Backout change 30300 (Move internal protos from mesos::internal to mesos namespace)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31162/#review73025 --- include/mesos/authentication/authentication.hpp https://reviews.apache.org/r/31162/#comment119185 We don't do using namespace in headers. - Jie Yu On Feb. 18, 2015, 10:27 p.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31162/ --- (Updated Feb. 18, 2015, 10:27 p.m.) Review request for mesos, Ben Mahler, Jie Yu, Niklas Nielsen, Till Toenshoff, and Vinod Kone. Bugs: MESOS-2371 https://issues.apache.org/jira/browse/MESOS-2371 Repository: mesos Description --- This changeset puts the internal protos back into mesos::internal namespace. This is required for upgrade compatibility. Without it, a newer Master would reject all messages from an older Slave and vice-versa. https://reviews.apache.org/r/30300/ Diffs - include/mesos/authentication/authentication.hpp 699aa886286bc7d9c05592e71232ab1c1084871f include/mesos/authentication/authentication.proto 38a6f781d6a0b4618a14e1681420564d78b840a8 include/mesos/module/module.hpp e83be2822b7c0e7935ab1c8af36e5cb9b5180f20 include/mesos/module/module.proto 821fc0e72ece7c497595859fc5efc1c64ea49b9b include/mesos/type_utils.hpp cdf5864389a72002b538c263d70bcade2bdffa45 src/common/parse.hpp 547b32041f39f0ff0c38179b66a32b2239134abc src/common/type_utils.cpp 12a36bbd7d7773b25dedf2d0d951c79e0b5141d6 src/log/consensus.hpp ee9e9081ffe2a5f18433d9eff1a2e2cf17c81863 src/log/leveldb.hpp 8f5df5bdfb08de02e4129494fba188f97c3b8010 src/log/leveldb.cpp 1d679425c6df3498be67d048bc05d02ffe03a636 src/log/replica.hpp b2602083564447cf75827257a2548c1b5aeb49e2 src/log/storage.hpp 5e81f4e9774ad239145019320cf991fcf51a9da5 src/master/registry.hpp d3cbe564dc8e14f3dd301e5bc06ed60d82856eee src/master/registry.proto 29a309763bca9db76c443d7c039ca152a2afbc5b src/messages/log.proto 12b3572e0fe0e30c5eff20c7af27eb3d7cfe8f4b src/messages/messages.hpp 25769b797ec9bd1a1c348c467616aef8407c45d6 src/messages/messages.proto 58484ae45071a80afd2b11803dd66a88f88ad9ed src/messages/state.proto 7fc4883b9fe590afbbb0261988c929c3b1f8f676 src/state/storage.hpp f5cd607b47d44f72d72cbd0c5adb7368410b83d8 src/tests/rate_limiting_tests.cpp 784ac76ee81918466de44196f5adc2b88adfee96 src/tests/state_tests.cpp b30812322dd8ae507e1d45912ab559940ac5ee73 Diff: https://reviews.apache.org/r/31162/diff/ Testing --- make check. Also tested with master running HEAD with this patch, and slave running 0.21.0 with src/test-framework. The tests succeed when src/test-framework is running from the same version as the slave. However, if I run src/test-framework from the master version, it doesn't succeed and errors out with unexpected TASK_LOST. Not sure if this is the desired behavior. Thanks, Kapil Arya
Re: Review Request 31162: Backout change 30300 (Move internal protos from mesos::internal to mesos namespace)
On Feb. 18, 2015, 10:42 p.m., Jie Yu wrote: include/mesos/authentication/authentication.hpp, lines 25-27 https://reviews.apache.org/r/31162/diff/1/?file=868264#file868264line25 We don't do using namespace in headers. Kapil Arya wrote: Yes, but this was to avoid changing numerous cpp files to include `using namespace`. Is it possible to make an exception until we can figure out a better way to handle `internal`? Can we first discuss what the right solution should be? And what's the upgrade story if we still want to move from mesos::internal to mesos? At least, we want to make sure we have a plausible way forward. Thoughts? - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31162/#review73025 --- On Feb. 18, 2015, 10:27 p.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31162/ --- (Updated Feb. 18, 2015, 10:27 p.m.) Review request for mesos, Ben Mahler, Jie Yu, Niklas Nielsen, Till Toenshoff, and Vinod Kone. Bugs: MESOS-2371 https://issues.apache.org/jira/browse/MESOS-2371 Repository: mesos Description --- This changeset puts the internal protos back into mesos::internal namespace. This is required for upgrade compatibility. Without it, a newer Master would reject all messages from an older Slave and vice-versa. https://reviews.apache.org/r/30300/ Diffs - include/mesos/authentication/authentication.hpp 699aa886286bc7d9c05592e71232ab1c1084871f include/mesos/authentication/authentication.proto 38a6f781d6a0b4618a14e1681420564d78b840a8 include/mesos/module/module.hpp e83be2822b7c0e7935ab1c8af36e5cb9b5180f20 include/mesos/module/module.proto 821fc0e72ece7c497595859fc5efc1c64ea49b9b include/mesos/type_utils.hpp cdf5864389a72002b538c263d70bcade2bdffa45 src/common/parse.hpp 547b32041f39f0ff0c38179b66a32b2239134abc src/common/type_utils.cpp 12a36bbd7d7773b25dedf2d0d951c79e0b5141d6 src/log/consensus.hpp ee9e9081ffe2a5f18433d9eff1a2e2cf17c81863 src/log/leveldb.hpp 8f5df5bdfb08de02e4129494fba188f97c3b8010 src/log/leveldb.cpp 1d679425c6df3498be67d048bc05d02ffe03a636 src/log/replica.hpp b2602083564447cf75827257a2548c1b5aeb49e2 src/log/storage.hpp 5e81f4e9774ad239145019320cf991fcf51a9da5 src/master/registry.hpp d3cbe564dc8e14f3dd301e5bc06ed60d82856eee src/master/registry.proto 29a309763bca9db76c443d7c039ca152a2afbc5b src/messages/log.proto 12b3572e0fe0e30c5eff20c7af27eb3d7cfe8f4b src/messages/messages.hpp 25769b797ec9bd1a1c348c467616aef8407c45d6 src/messages/messages.proto 58484ae45071a80afd2b11803dd66a88f88ad9ed src/messages/state.proto 7fc4883b9fe590afbbb0261988c929c3b1f8f676 src/state/storage.hpp f5cd607b47d44f72d72cbd0c5adb7368410b83d8 src/tests/rate_limiting_tests.cpp 784ac76ee81918466de44196f5adc2b88adfee96 src/tests/state_tests.cpp b30812322dd8ae507e1d45912ab559940ac5ee73 Diff: https://reviews.apache.org/r/31162/diff/ Testing --- make check. Also tested with master running HEAD with this patch, and slave running 0.21.0 with src/test-framework. The tests succeed when src/test-framework is running from the same version as the slave. However, if I run src/test-framework from the master version, it doesn't succeed and errors out with unexpected TASK_LOST. Not sure if this is the desired behavior. Thanks, Kapil Arya
Re: Review Request 31162: Backout change 30300 (Move internal protos from mesos::internal to mesos namespace)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31162/#review73066 --- The high level comment is: you should only put the namespace hacks in protobuf wrapper headers so that once we move forward, we can just go to all the protobuf header wrappers and remove those hacks. Rest the code base should not have any internal:: namespace (maybe one exception which is unfortunate). include/mesos/authentication/authentication.hpp https://reviews.apache.org/r/31162/#comment119249 Could you expand the comments here explaining why we need to do this include? Maybe link the ticket as well. Here and everywhere else. include/mesos/module/module.hpp https://reviews.apache.org/r/31162/#comment119250 Please move this to common/type_utils.hpp include/mesos/type_utils.hpp https://reviews.apache.org/r/31162/#comment119248 Instead of pulling it into module.hpp, can you keep it here? Also, if you include module/module.hpp, do you still need to change this code? src/common/parse.hpp https://reviews.apache.org/r/31162/#comment119263 This is a little unfortunate, could you please add a TODO here and link the ticket? src/common/type_utils.cpp https://reviews.apache.org/r/31162/#comment119251 I am wondering why this is necessary if you already included messages/messages.hpp (which has using namespace mesos::internal)? src/log/consensus.hpp https://reviews.apache.org/r/31162/#comment119253 Let's keep all the namespace hacks in protobuf wrapper headers so that it's easy for us to find them later. In this case, you can put `using namespace mesos::internal::log` in messages/log.hpp. src/log/leveldb.hpp https://reviews.apache.org/r/31162/#comment119260 Ditto above. src/log/leveldb.cpp https://reviews.apache.org/r/31162/#comment119259 Ditto above. src/log/replica.hpp https://reviews.apache.org/r/31162/#comment119254 Ditto above. src/messages/messages.hpp https://reviews.apache.org/r/31162/#comment119256 Curious why these are not in type_utils? src/state/storage.hpp https://reviews.apache.org/r/31162/#comment119258 Let's do the similiar thing here (like the log above). Kill this and put that into messages/state.hpp. src/tests/rate_limiting_tests.cpp https://reviews.apache.org/r/31162/#comment119261 I would suggest just change this comment to not include the namespace so that we don't forget to change this once we move forward. ``` Message RegisterFrameworkMessage ... ``` Please do a sweep to find all such cases and fix them. src/tests/state_tests.cpp https://reviews.apache.org/r/31162/#comment119262 typedef Registry::Slaves Slaves; typedef Registry::Slave Slave; - Jie Yu On Feb. 19, 2015, 12:59 a.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31162/ --- (Updated Feb. 19, 2015, 12:59 a.m.) Review request for mesos, Ben Mahler, Jie Yu, Niklas Nielsen, Till Toenshoff, and Vinod Kone. Bugs: MESOS-2371 https://issues.apache.org/jira/browse/MESOS-2371 Repository: mesos Description --- This changeset puts the internal protos back into mesos::internal namespace. This is required for upgrade compatibility. Without it, a newer Master would reject all messages from an older Slave and vice-versa. https://reviews.apache.org/r/30300/ Diffs - include/mesos/authentication/authentication.hpp 699aa886286bc7d9c05592e71232ab1c1084871f include/mesos/authentication/authentication.proto 38a6f781d6a0b4618a14e1681420564d78b840a8 include/mesos/module/module.hpp e83be2822b7c0e7935ab1c8af36e5cb9b5180f20 include/mesos/module/module.proto 821fc0e72ece7c497595859fc5efc1c64ea49b9b include/mesos/type_utils.hpp cdf5864389a72002b538c263d70bcade2bdffa45 src/common/parse.hpp 547b32041f39f0ff0c38179b66a32b2239134abc src/common/type_utils.cpp 12a36bbd7d7773b25dedf2d0d951c79e0b5141d6 src/log/consensus.hpp ee9e9081ffe2a5f18433d9eff1a2e2cf17c81863 src/log/leveldb.hpp 8f5df5bdfb08de02e4129494fba188f97c3b8010 src/log/leveldb.cpp 1d679425c6df3498be67d048bc05d02ffe03a636 src/log/replica.hpp b2602083564447cf75827257a2548c1b5aeb49e2 src/log/storage.hpp 5e81f4e9774ad239145019320cf991fcf51a9da5 src/master/registry.hpp d3cbe564dc8e14f3dd301e5bc06ed60d82856eee src/master/registry.proto 29a309763bca9db76c443d7c039ca152a2afbc5b src/messages/log.proto 12b3572e0fe0e30c5eff20c7af27eb3d7cfe8f4b src/messages/messages.hpp 25769b797ec9bd1a1c348c467616aef8407c45d6 src/messages/messages.proto 58484ae45071a80afd2b11803dd66a88f88ad9ed src/messages/state.proto
Re: Review Request 31008: Refactored EventListener to allow for continuous monitoring of an event.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31008/#review72866 --- src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118987 This is still buggy right? You call fd.get() even if fd is an error? src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118988 Please update the comments. src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118986 Indent. - Jie Yu On Feb. 18, 2015, 1:14 a.m., Chi Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31008/ --- (Updated Feb. 18, 2015, 1:14 a.m.) Review request for mesos, Dominic Hamon, Ian Downes, and Jie Yu. Bugs: mesos-2136 https://issues.apache.org/jira/browse/mesos-2136 Repository: mesos Description --- Refactored EventListener to allow for continuous monitoring of an event. Diffs - src/linux/cgroups.hpp bf8efee05b995a37d4e6e7679a493b2d5238aa0b src/linux/cgroups.cpp b6f75b14dea7609b90627eccdd33ef891ed9d21f Diff: https://reviews.apache.org/r/31008/diff/ Testing --- Thanks, Chi Zhang
Re: Review Request 31024: Changed the slave logic so that it always waits for containerizer update to finish.
On Feb. 17, 2015, 7:09 p.m., Vinod Kone wrote: src/slave/slave.cpp, line 1695 https://reviews.apache.org/r/31024/diff/1/?file=863796#file863796line1695 Not yours, but can you add a comment here: // Send terminal update which removes the task from 'queuedTasks'. I think there is a NOTE above already. I'll move it down to here. On Feb. 17, 2015, 7:09 p.m., Vinod Kone wrote: src/slave/slave.cpp, lines 1738-1747 https://reviews.apache.org/r/31024/diff/1/?file=863796#file863796line1738 Hmmm. I don't think we want to do this. If an executor has been running for a while, and all its launched tasks have terminated, we want it to keep running and wait for more tasks. With the new semantics, it's possible that we kill the executor if a kill task comes for a queued task. We did this optimization for unregistered executor as a stop gap because there was no shutdown executor api call. Per our offline discussion, this is a stop gap for those single task executors that do not have a proper logic for self terminating if no task is received for an extended period of time. I added a TODO here to remove this logic because looks like a bug of the executor. On Feb. 17, 2015, 7:09 p.m., Vinod Kone wrote: src/slave/slave.cpp, lines 2441-2451 https://reviews.apache.org/r/31024/diff/1/?file=863796#file863796line2441 Shouldn't the monitoring be done after the containerizer update completes? I'll leave a TODO here. If 'update' fails, the container will be terminated and thus stopping the monitor. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31024/#review72747 --- On Feb. 13, 2015, 10:43 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31024/ --- (Updated Feb. 13, 2015, 10:43 p.m.) Review request for mesos, Ian Downes and Vinod Kone. Bugs: MESOS-998 https://issues.apache.org/jira/browse/MESOS-998 Repository: mesos Description --- Changed the slave logic so that it always waits for containerizer update to finish. The key idea is to leverage `executor-queuedTasks` when containerizer-update is in progress. Diffs - include/mesos/type_utils.hpp 32dc6ed489f43ba3695507f59c4c2d94028c8df1 src/slave/slave.hpp 7a399f6df50c69b7e1e12d74f076fa57b6edb1b3 src/slave/slave.cpp f39a876cdd6b580a7a75fd053e6923761bee7635 src/tests/master_tests.cpp c678527942506ef23fa4f1aebf9664e4cc21b561 src/tests/registrar_zookeeper_tests.cpp 30d3cd8b1e7d56fbb326c9feccd4c3cbae2015de src/tests/scheduler_tests.cpp 080ec7236246794de6c8cc7356a06331c5e48cd5 src/tests/slave_tests.cpp a02e335576bf68b449a6286fa5cf5093b1b7182a Diff: https://reviews.apache.org/r/31024/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 31008: Refactored EventListener to allow for continuous monitoring of an event.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31008/#review72825 --- Ship it! Looking good! Only a few nits. src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118917 and assumes that arguments are validated. s/public function/public interface/ src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118918 ```s/eventfd_/fd``` src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118920 Hum, what if `registerNotifier` fails? This will be a segfault! I would suggest reverting the change and making `eventfd` an `Optionint`. src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118923 Not your fault, but let's fail the promise here ``` promise.get()-fail(Event listener is terminating); ``` src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118924 This comment is confusing. I would suggest just kill it. src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118927 insert one blank line above. Also, i would suggest the following style: ``` if (reading.isReady() reading.get() == sizeof(data)) { promise.get()-set(data); // ... promise = None(); return; } // Error cases. ``` src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118928 You don't need std:: prefix here as this is in cpp. Please do a sweep to fix all such code in this file. Thanks! Also, see my comments below. src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118929 I would suggest simply set the error directly: ``` if (reading.isDiscarded()) { error = Error(...); } else if (...) { error = Error(...); } else { error = Error(...); } promise.get()-fail(error.get().message); ``` src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118932 Read less than expected: expect yyy bytes and actual xxx bytes src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118922 Why reordering these fields? src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118921 See my above comments, please revert this change. src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118936 Style nits. Align them probably: ``` future .onDiscard(lambda::bind( ...) .onAny(lambda::bind( ...); ``` src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118934 One more blank line - Jie Yu On Feb. 17, 2015, 8:10 p.m., Chi Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31008/ --- (Updated Feb. 17, 2015, 8:10 p.m.) Review request for mesos, Dominic Hamon, Ian Downes, and Jie Yu. Bugs: mesos-2136 https://issues.apache.org/jira/browse/mesos-2136 Repository: mesos Description --- Refactored EventListener to allow for continuous monitoring of an event. Diffs - src/linux/cgroups.hpp bf8efee05b995a37d4e6e7679a493b2d5238aa0b src/linux/cgroups.cpp b6f75b14dea7609b90627eccdd33ef891ed9d21f Diff: https://reviews.apache.org/r/31008/diff/ Testing --- Thanks, Chi Zhang
Re: Review Request 30911: Updated the filter abstraction for Resources.
On Feb. 16, 2015, 11:43 p.m., Jie Yu wrote: include/mesos/resources.hpp, line 61 https://reviews.apache.org/r/30911/diff/2/?file=863975#file863975line61 We avoid using typedefs as much as possible trying to be more explicit. Alexander Rukletsov wrote: I think it's good place to make an exception to our rules: this typedef makes client code brief and easier to understand. Look for example into `Resources::flatten()`. OK, looks like so if we do not allow 'auto'. Feel free to drop this one. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30911/#review72660 --- On Feb. 14, 2015, 12:11 a.m., Michael Park wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30911/ --- (Updated Feb. 14, 2015, 12:11 a.m.) Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Jie Yu, and Vinod Kone. Bugs: MESOS-2348 https://issues.apache.org/jira/browse/MESOS-2348 Repository: mesos Description --- See [JIRA Ticket](https://issues.apache.org/jira/browse/MESOS-2348). Diffs - include/mesos/resources.hpp c7cc46e0183ea97013dd088a717da6c0e6ed5cf0 src/common/resources.cpp 98371f6873482d0cdbefeb279b58ae6cc680a88f src/master/allocator/mesos/hierarchical.hpp 2680d6231927867d5a8d75cbc42b81d6c75fc7f2 src/master/master.hpp 6a39df04514c756415354fae66c5835ada191c52 src/master/validation.cpp acc35b25c93f2d3900d79c3070b1d681416ed66b src/slave/slave.cpp ec7ec1356e745bb07484ae1755c9183b038043b3 src/tests/hierarchical_allocator_tests.cpp eeecfb64540b1915074aaffaa5d506b203bc src/tests/resources_tests.cpp 3f98782fd437dba808d720bf8e9b94b8fa7e0feb Diff: https://reviews.apache.org/r/30911/diff/ Testing --- make check Thanks, Michael Park
Re: Review Request 31024: Changed the slave logic so that it always waits for containerizer update to finish.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31024/ --- (Updated Feb. 18, 2015, 12:01 a.m.) Review request for mesos, Ian Downes and Vinod Kone. Changes --- Rebased. Bugs: MESOS-998 https://issues.apache.org/jira/browse/MESOS-998 Repository: mesos Description --- Changed the slave logic so that it always waits for containerizer update to finish. The key idea is to leverage `executor-queuedTasks` when containerizer-update is in progress. Diffs (updated) - include/mesos/type_utils.hpp 32dc6ed489f43ba3695507f59c4c2d94028c8df1 src/slave/slave.hpp 7a399f6df50c69b7e1e12d74f076fa57b6edb1b3 src/slave/slave.cpp ec7ec1356e745bb07484ae1755c9183b038043b3 src/tests/master_tests.cpp e69a7fb39e93a9a999e0991985f5d060b72aac65 src/tests/registrar_zookeeper_tests.cpp 30d3cd8b1e7d56fbb326c9feccd4c3cbae2015de src/tests/scheduler_tests.cpp 080ec7236246794de6c8cc7356a06331c5e48cd5 src/tests/slave_tests.cpp 21709e2858d91a2746e0343786f73dec22735da6 Diff: https://reviews.apache.org/r/31024/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 31024: Changed the slave logic so that it always waits for containerizer update to finish.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31024/ --- (Updated Feb. 17, 2015, 11:19 p.m.) Review request for mesos, Ian Downes and Vinod Kone. Changes --- Vinod's comments. Bugs: MESOS-998 https://issues.apache.org/jira/browse/MESOS-998 Repository: mesos Description --- Changed the slave logic so that it always waits for containerizer update to finish. The key idea is to leverage `executor-queuedTasks` when containerizer-update is in progress. Diffs (updated) - include/mesos/type_utils.hpp 32dc6ed489f43ba3695507f59c4c2d94028c8df1 src/slave/slave.hpp 7a399f6df50c69b7e1e12d74f076fa57b6edb1b3 src/slave/slave.cpp ec7ec1356e745bb07484ae1755c9183b038043b3 src/tests/master_tests.cpp e69a7fb39e93a9a999e0991985f5d060b72aac65 src/tests/registrar_zookeeper_tests.cpp 30d3cd8b1e7d56fbb326c9feccd4c3cbae2015de src/tests/scheduler_tests.cpp 080ec7236246794de6c8cc7356a06331c5e48cd5 src/tests/slave_tests.cpp 21709e2858d91a2746e0343786f73dec22735da6 Diff: https://reviews.apache.org/r/31024/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30906: Added a metric for launcher's container destruction failure.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30906/#review72812 --- src/slave/containerizer/linux_launcher.cpp https://reviews.apache.org/r/30906/#comment118898 You may wanna add a note here saying that we expect the lifetime of launcher is longer than the time it take for `ns::pid::destroy` to finish. Otherwise, we may delete `this` before onAny is called, leading to a segfault potentially. Can you double check that the above is always enforced in our code (especially in tests). - Jie Yu On Feb. 13, 2015, 10:37 p.m., Vinod Kone wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30906/ --- (Updated Feb. 13, 2015, 10:37 p.m.) Review request for mesos, Ian Downes and Jie Yu. Bugs: MESOS-1690 https://issues.apache.org/jira/browse/MESOS-1690 Repository: mesos Description --- See summary. Diffs - src/Makefile.am d372404d84c9ac0bc49da7407ad366701b9586a6 src/slave/containerizer/linux_launcher.hpp 37304222323d3089b95df605d2077a3f88d11277 src/slave/containerizer/linux_launcher.cpp 3fdfb5709d48d079bbe904a91b773f131131b08d src/slave/containerizer/mesos/containerizer.cpp d5b90d12d63becfeb4c3efa9c6f5d826417f582c src/tests/launcher_tests.cpp PRE-CREATION Diff: https://reviews.apache.org/r/30906/diff/ Testing --- make check Thanks, Vinod Kone
Re: Review Request 29288: stout: Created IP address abstraction for different protocol families
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29288/#review72771 --- Glad to see this being worked on, Thanks for all the effort! This is a partial review. Let's address the following comments first and then I'll take a second pass. One high level comment is that we should separate this giant patch. Many changes are being mixed together, making it very hard for reviewers to digest. For example, you renamed `ip` to `IP::fromLinkDevice`. I think that can be pulled into a separate patch, right? Also, let's try not to introduce helper functions that are not currently being used. You can leave a TODO there. That can also bring this patch smaller. For example, you don't have to add the implementation for IPv6 case in `IP::fromLinkDevice` because it's not being used yet. You can always leave a TODO and fill the implementation later in a separate patch. That'll make the reviewer's life much better. I would suggest we first refactor `net::IP` and separate the concepts of IP address and IP network (see my comments below). No need to introduce the concept of family (and all IPv6 stuffs) yet in this patch. You need to update all the callsites that use `net::IP` (e.g., the port mapping isolator might want to take `net::IPNetwork` instead of `net::IP` in some cases). Let's start with this first! And then, in the second patch, you can make `net::IP` IPv6 capable. That means you need to store a union. Simply add TODO in those helper functions that you think you need to fill IPv6 implementaion in. Let's try to keep that patch small as well. Then, we'll see what we go from there. Does that sound good to you? 3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp https://reviews.apache.org/r/29288/#comment118944 It's partially my fault, but we really should separate the concepts of IP address and IP network. This is pretty standard in other lauguages. For instance, Python has `IPAddress` and `IPNetwork`. Go has `IP` and `IPNet`. Given that we already have `net::IP`, I would sugguest that we use that to represent a single IP address and introduce `net::IPNetwork` to represent an IP network (with netmask/prefix). Highly suggest that you take a look at the Go and Python APIs: ``` http://golang.org/pkg/net/ https://docs.python.org/3/library/ipaddress.html ``` 3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp https://reviews.apache.org/r/29288/#comment118960 I would suggest rename it to `parse`. We need to have both `IP::parse(10.0.0.1)` and `IPNetwork::parse(10.0.0.1/8)`. 3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp https://reviews.apache.org/r/29288/#comment118963 This need to be moved to `IPNetwork`. Suggest rename it to `TryIPNetwork IPNetwork::create(const IP address, const IP netmask)`. 3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp https://reviews.apache.org/r/29288/#comment118965 Ditto above. This should be: ``` TryIPNetwork IPNetwork::create(const IP address, size_t prefix); ``` 3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp https://reviews.apache.org/r/29288/#comment118966 Can you remind me why this is needed? 3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp https://reviews.apache.org/r/29288/#comment118843 Looking at the comments of fields `IP::address_` and `IP::netmask_`, looks like we always store addresses in host order. That makes me wondering how this function is possible? 3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp https://reviews.apache.org/r/29288/#comment118982 Is this needed anywhere? If not, let's keep this patch a pure refactor. 3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp https://reviews.apache.org/r/29288/#comment118981 See my top level comments. Let's add the IPv6 stuffs later in a different patch! 3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp https://reviews.apache.org/r/29288/#comment118983 The renaming can be pulled into a separate patch. See my top level comments. We can also introduce the family later once we start to fill IPv6 stuffs. 3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp https://reviews.apache.org/r/29288/#comment118980 See my top level comments. Let's implement the IPv6 case in a separate patch. - Jie Yu On Feb. 12, 2015, 8:05 p.m., Evelina Dumitrescu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29288/ --- (Updated Feb. 12, 2015, 8:05 p.m.) Review request for mesos, Benjamin Hindman, Dominic Hamon, Jie Yu, Joris Van Remoortere, and Niklas Nielsen. Bugs: MESOS-1919 https
Re: Review Request 31008: Refactored EventListener to allow for continuous monitoring of an event.
On Feb. 16, 2015, 10:48 p.m., Jie Yu wrote: src/linux/cgroups.cpp, lines 1204-1210 https://reviews.apache.org/r/31008/diff/1/?file=863519#file863519line1204 To be consistent and symetric, putting `registerNotifier` in `initialize` is better than putting it here. What if the caller creates the process but forgets to spawn it? Do you leak an opened event file? I would rather keep the existing constructor. We can hide the public constructor and make those exposed public functions friend. Validations can go into those public exposed functions. Chi Zhang wrote: If we validate in the public / friended function by calling registerNotifier and closing the fd, can we just assume it'd always succeed in the constructor / initialize (which are void functions)? even if we could, it's still 2 pieces of the same code. I agree with being consistent and symetric, and to address the above leak, how about we put unregisterNotifier to destructor? On the construction side, we use a public creation function as the only construction path that provides explicit validation; the private constructor still takes the opened eventfd to avoid some code duplication? Does it sound like a better balance? IMHO more commonly client code forgets about proper closing after it has got the results than forgetting about getting the results -- i.e., it's probably more likely to forget to 'terminate' than to 'spawn'. Terminate requires an explict call; but compiler invokes destructor for us -- we could reduce the possible leak more for client code this way? Just my 2 cents. Putting `registerNotifier` in static create function and putting `unregiserNotifier` in destructor does not sound very symetric to me. The lifecycle of the eventfd is controlled by two entities: static create and event::Listener. I still sugguest putting `registerNotifier` and `unregisterNotifier` in `initialize` and `finalize`. You need to store an `OptionError error;` in `event::Listener`. If `error.isSome()`, simply fail all `listen` attempts. During initializtion of event::Listener, register the notifier and set `error` if any error occurs. On Feb. 16, 2015, 10:48 p.m., Jie Yu wrote: src/linux/cgroups.cpp, line 1318 https://reviews.apache.org/r/31008/diff/1/?file=863519#file863519line1318 Why do you need a OwnedPromise here? Can it simply be: ``` OptionPromiseuint64 promsie; ``` Chi Zhang wrote: I don't think that'd compile since promise is not copyable and not assignable. More, Promise doesn't seem to have a 'reset' function to reset its associated future to PENDING again (could this be a useful patch, BTW?), which means a new promise has to be created to have a unset future. How about ResultPromise* promise? We can have: None -- whenever there isn't a pending read and no errors have encounterred before. Also the initial value. Some -- Have a pending read, so we can reuse this promise. When the read completes successfully, future is set and promise is (deleted and) reset to none again. Error -- Set if a read is failed. The public listen functiion returns error from this point on and won't start a new listen. Leave it to the client code to clean this process up. Re: what if client discards the future returned by listen: How about change it to discard the current read if it hasn't completed or had an error? If it has, use the above logic. Basically future returned by listen only controls a single listen; client uses the process pointer from the constructor path to control the process' lifetime? IC. I think Owned has a `reset` function. Maybe use that? Regarding the discard semantics, what if there are two entities called `listen`? If one of them discard the future, what happens to the other one? - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31008/#review72643 --- On Feb. 13, 2015, 6:07 p.m., Chi Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31008/ --- (Updated Feb. 13, 2015, 6:07 p.m.) Review request for mesos, Dominic Hamon, Ian Downes, and Jie Yu. Bugs: mesos-2136 https://issues.apache.org/jira/browse/mesos-2136 Repository: mesos Description --- Refactored EventListener to allow for continuous monitoring of an event. Diffs - src/linux/cgroups.hpp bf8efee05b995a37d4e6e7679a493b2d5238aa0b src/linux/cgroups.cpp b6f75b14dea7609b90627eccdd33ef891ed9d21f Diff: https://reviews.apache.org/r/31008/diff/ Testing --- Thanks, Chi Zhang
Re: Review Request 30952: Adding scheduler validations to master
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30952/#review72634 --- Looks nice! Do you want to remove these logics in the drivers (old and new)? src/master/master.cpp https://reviews.apache.org/r/30952/#comment118679 Hum, why do you want to move the 'accept' invokation to the bottom? I would suggest keep this function the same as before. src/master/master.cpp https://reviews.apache.org/r/30952/#comment118680 We usually avoid using non-const references because that makes the reasoning not local. See my comments below, this is not needed. src/master/master.cpp https://reviews.apache.org/r/30952/#comment118681 I would suggest move this logic to `_accept`. See my comments below. src/master/master.cpp https://reviews.apache.org/r/30952/#comment118682 You probably want to move the framework ID filling logic here right before the task validation. You need to make a copy of `task`: ``` // Make a copy of the original task so that we can // fill the missing `framework_id` in ExecutorInfo // if needed. This field was added to the API later // and thus was made optional. TaskInfo _task = task; if (task.has_executor() !task.executor().has_framework_id()) { _task.mutable_executor()-mutable_framework_id()-CopyFrom(framework-id); } // Validate the task. const OptionError validationError = validation::task::validate( _task, framework, slave, _offeredResources); ... ``` src/master/validation.cpp https://reviews.apache.org/r/30952/#comment118683 Do you want to add a test, or there is a test already? - Jie Yu On Feb. 16, 2015, 8:53 a.m., Isabel Jimenez wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30952/ --- (Updated Feb. 16, 2015, 8:53 a.m.) Review request for mesos, Jie Yu, Joris Van Remoortere, and Vinod Kone. Bugs: MESOS-2290 https://issues.apache.org/jira/browse/MESOS-2290 Repository: mesos-incubating Description --- There is a scheduler validation (https://github.com/apache/mesos/blob/master/src/scheduler/scheduler.cpp#L275) missing in master. See motivation on MESOS-2288. Diffs - src/master/master.hpp 6a39df0 src/master/master.cpp f10a3cf src/master/validation.cpp acc35b2 Diff: https://reviews.apache.org/r/30952/diff/ Testing --- make check and distcheck. Tests will be added with new HTTP API endpoints tests. Thanks, Isabel Jimenez
Re: Review Request 30906: Added a metric for launcher's container destruction failure.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30906/#review72639 --- src/slave/containerizer/linux_launcher.cpp https://reviews.apache.org/r/30906/#comment118684 Any reason not place this metrics at top level (i.e., in `MesosContainerizerProcess::__destroy`)? In that way, you don't need to do `++metrics.destroy_error` multiple times, thoughts? - Jie Yu On Feb. 13, 2015, 10:37 p.m., Vinod Kone wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30906/ --- (Updated Feb. 13, 2015, 10:37 p.m.) Review request for mesos, Ian Downes and Jie Yu. Bugs: MESOS-1690 https://issues.apache.org/jira/browse/MESOS-1690 Repository: mesos Description --- See summary. Diffs - src/Makefile.am d372404d84c9ac0bc49da7407ad366701b9586a6 src/slave/containerizer/linux_launcher.hpp 37304222323d3089b95df605d2077a3f88d11277 src/slave/containerizer/linux_launcher.cpp 3fdfb5709d48d079bbe904a91b773f131131b08d src/slave/containerizer/mesos/containerizer.cpp d5b90d12d63becfeb4c3efa9c6f5d826417f582c src/tests/launcher_tests.cpp PRE-CREATION Diff: https://reviews.apache.org/r/30906/diff/ Testing --- make check Thanks, Vinod Kone
Re: Review Request 30958: PortMappingIsolator: Better ingress qdisc management on eth0 and lo.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30958/#review72640 --- src/slave/containerizer/isolators/network/port_mapping.cpp https://reviews.apache.org/r/30958/#comment118685 Do you have a ticket explaining why this is needed? src/slave/containerizer/isolators/network/port_mapping.cpp https://reviews.apache.org/r/30958/#comment118686 If the ingress qdisc is not installed by mesos, will you be removing it? That being said, I want to understand the motivation for this in a JIRA. - Jie Yu On Feb. 13, 2015, 6:24 p.m., Chi Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30958/ --- (Updated Feb. 13, 2015, 6:24 p.m.) Review request for mesos, Jie Yu and Cong Wang. Repository: mesos Description --- PortMappingIsolator: Better ingress qdisc management on eth0 and lo. Remove ingress qdiscs on eth0 and lo after the last container exits. They will be added back when the next container comes in. This makes sure after a clean shutdown, the sytem will be clean. Diffs - src/slave/containerizer/isolators/network/port_mapping.cpp b860b17e59486cd6097183a4d3139fbd5c897b44 Diff: https://reviews.apache.org/r/30958/diff/ Testing --- Thanks, Chi Zhang
Re: Review Request 31008: Refactored EventListener to allow for continuous monitoring of an event.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31008/#review72643 --- Thanks Chi! This is my first pass. Will do a second pass one the following issues are addressed. src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118687 We usually call a libprocess process a 'process'. So please revert this change. src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118703 To be consistent and symetric, putting `registerNotifier` in `initialize` is better than putting it here. What if the caller creates the process but forgets to spawn it? Do you leak an opened event file? I would rather keep the existing constructor. We can hide the public constructor and make those exposed public functions friend. Validations can go into those public exposed functions. src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118704 OK, let's document the semantics for this public function. The semantics you have is slightly different than I initially thought, but I think you sematics is simpler and easy to implement. Your semantics is actually waiting for the *next* event to occur and returns the value read from the event file. Calling multiple `read` returns the same future which will be satisfied when the *next* event occurs. Because of that, I think we probably should rename `read` to `listen`. What do you think? Also, what is the discard semantics. What if the user discard the returned future, what's gonna happen? src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118709 See my comments below. No need to print path and args as they should be printed by the caller. src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118710 Ditto.. src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118708 Thanks for expanding the error message. However, you don't need to print the path and args because these are caller's responsibility. src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118707 So you reset the promise once a `read` has finished, even if it has failed. This is problematic because what if the previous read only returns a partial result (i.e., reading.get() != sizeof(data)), will that affect the following reads? I guess the behavior is undefined if a previous read returns a failure. So you probably want to return failure if there is a failure in the previous read. src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118706 Why do you need a OwnedPromise here? Can it simply be: ``` OptionPromiseuint64 promsie; ``` src/linux/cgroups.cpp https://reviews.apache.org/r/31008/#comment118711 You should not call a method in a process directly. Please use dispatch! - Jie Yu On Feb. 13, 2015, 6:07 p.m., Chi Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31008/ --- (Updated Feb. 13, 2015, 6:07 p.m.) Review request for mesos, Dominic Hamon, Ian Downes, and Jie Yu. Bugs: mesos-2136 https://issues.apache.org/jira/browse/mesos-2136 Repository: mesos Description --- Refactored EventListener to allow for continuous monitoring of an event. Diffs - src/linux/cgroups.hpp bf8efee05b995a37d4e6e7679a493b2d5238aa0b src/linux/cgroups.cpp b6f75b14dea7609b90627eccdd33ef891ed9d21f Diff: https://reviews.apache.org/r/31008/diff/ Testing --- Thanks, Chi Zhang
Re: Review Request 29742: Added utility functions to determine types of resources.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29742/#review72659 --- Ship it! Thanks for the cleanup! src/master/validation.cpp https://reviews.apache.org/r/29742/#comment118712 The indent here should be 4 spaces. - Jie Yu On Feb. 13, 2015, 10:59 p.m., Michael Park wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29742/ --- (Updated Feb. 13, 2015, 10:59 p.m.) Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Jie Yu, and Vinod Kone. Repository: mesos Description --- # Motivation The main motivation for introducing these functions is to capture the definition of various identification of resources. With these functions capturing various definitions of concepts for us, we gain: - readability. - engineering benefits. ## Example For example, consider the concept of persistent volume. Currently we do `if (resource.has_disk() resource.has_persistence())` throughout the codebase to test to identify this type of resource. ### Readability From a readability perspective, `if (resource.has_disk() resource.has_persistence())` simply harder to read than `if (Resource::persistentVolume(resource))`. A foreign reader also can't be sure that the first predicate is checking for a persistent volume without digging deeper into the codebase. (Maybe we actually have an additional requirement for a resource to be considered a persistent volume.) ### Engineering Benefit If and when we realize that the definition needs to be updated, we shouldn't have to change the predicate every `if` statement that checks for a persistent volume. If you're thinking, just grep for `if (resource.has_disk() resource.has_persistence())`..., what if we didn't use `resource` as the variable name? what if we actually did `if (!(resource.has_disk() resource.has_persistence()))`? what about `if (!resource.has_disk() || !resource.has_persistence()))`? In general I believe this approach makes it hard to keep the definitions consistent throughout the codebase. Instead, we should consistently use the predicates that capture the definition, (e.g. `Resource::persistentVolume(resource)`) and later on if we change the definition of persistent volume, we simply update the definition of `persistentVolume` and we're done. ## Why not just have a Filter instead? Fundamentally a `Filter` is built on a **unary predicate**. Given a list of elements, we keep elements that satisfy the predicate. We *could* embed these predicates into a `Filter` and only provide those. But 1. I don't think a `Filter` is necessarily the right tool for every job. 2. Unary predicates are the basis of many algorithms (e.g. `all_of`, `any_of`, `none_of`, `count_if`, `find_if`) and therefore deserve to exist in their own right. Diffs - include/mesos/resources.hpp c7cc46e0183ea97013dd088a717da6c0e6ed5cf0 src/common/resources.cpp 98371f6873482d0cdbefeb279b58ae6cc680a88f src/master/master.hpp 6a39df04514c756415354fae66c5835ada191c52 src/master/validation.cpp acc35b25c93f2d3900d79c3070b1d681416ed66b src/slave/slave.cpp ec7ec1356e745bb07484ae1755c9183b038043b3 Diff: https://reviews.apache.org/r/29742/diff/ Testing --- make check Thanks, Michael Park
Re: Review Request 30911: Updated the filter abstraction for Resources.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30911/#review72660 --- include/mesos/resources.hpp https://reviews.apache.org/r/30911/#comment118713 We avoid using typedefs as much as possible trying to be more explicit. include/mesos/resources.hpp https://reviews.apache.org/r/30911/#comment118715 See my comments below. Can we punt this for now and leave a TODO for supporing filtering for RepeatedPtrField? This struct itself is confusing because we have `reserved` as well. ``` Resources::Reserved(role1) ``` vs ``` resources.reserved(role1) ``` I think being explicit and using lambda::bind is not a bad thing. As least the readers immediately know that `filter` takes a predicate function. What do you think? include/mesos/resources.hpp https://reviews.apache.org/r/30911/#comment118716 I don't think we should remove these two helper functions as they are quite convinent to use. (just like we still have empty() in std containers even if we can use size() == 0 to replace it). include/mesos/resources.hpp https://reviews.apache.org/r/30911/#comment118714 Can we make `filter` function a member of `Resources` instead. We usually prefer member functions than global functions. Also, looking at the callsites, this version does not read well. For example: ``` filter(Resources::Reserved(role2), slave2.resources()) ``` It's not obvious what does this function do. For instance, Readers might think that `Reserved(role2)` is a function which returns all resources reserved for `role2`. I understand that you want to make filter seamlessly work for RepeatedPtrField. Let's punt that for now and leave a TODO. For the above case, simply do the following for now (i.e., still keep `reserved` and `unreserved`). ``` Resources(slave2.resourecs()).reserved(role2); ``` - Jie Yu On Feb. 14, 2015, 12:11 a.m., Michael Park wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30911/ --- (Updated Feb. 14, 2015, 12:11 a.m.) Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Jie Yu, and Vinod Kone. Bugs: MESOS-2348 https://issues.apache.org/jira/browse/MESOS-2348 Repository: mesos Description --- See [JIRA Ticket](https://issues.apache.org/jira/browse/MESOS-2348). Diffs - include/mesos/resources.hpp c7cc46e0183ea97013dd088a717da6c0e6ed5cf0 src/common/resources.cpp 98371f6873482d0cdbefeb279b58ae6cc680a88f src/master/allocator/mesos/hierarchical.hpp 2680d6231927867d5a8d75cbc42b81d6c75fc7f2 src/master/master.hpp 6a39df04514c756415354fae66c5835ada191c52 src/master/validation.cpp acc35b25c93f2d3900d79c3070b1d681416ed66b src/slave/slave.cpp ec7ec1356e745bb07484ae1755c9183b038043b3 src/tests/hierarchical_allocator_tests.cpp eeecfb64540b1915074aaffaa5d506b203bc src/tests/resources_tests.cpp 3f98782fd437dba808d720bf8e9b94b8fa7e0feb Diff: https://reviews.apache.org/r/30911/diff/ Testing --- make check Thanks, Michael Park
Review Request 31024: Changed the slave logic so that it always waits for containerizer update to finish.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31024/ --- Review request for mesos, Ian Downes and Vinod Kone. Bugs: MESOS-998 https://issues.apache.org/jira/browse/MESOS-998 Repository: mesos Description --- Changed the slave logic so that it always waits for containerizer update to finish. The key idea is to leverage `executor-queuedTasks` when containerizer-update is in progress. Diffs - include/mesos/type_utils.hpp 32dc6ed489f43ba3695507f59c4c2d94028c8df1 src/slave/slave.hpp 7a399f6df50c69b7e1e12d74f076fa57b6edb1b3 src/slave/slave.cpp f39a876cdd6b580a7a75fd053e6923761bee7635 src/tests/master_tests.cpp c678527942506ef23fa4f1aebf9664e4cc21b561 src/tests/registrar_zookeeper_tests.cpp 30d3cd8b1e7d56fbb326c9feccd4c3cbae2015de src/tests/scheduler_tests.cpp 080ec7236246794de6c8cc7356a06331c5e48cd5 src/tests/slave_tests.cpp a02e335576bf68b449a6286fa5cf5093b1b7182a Diff: https://reviews.apache.org/r/31024/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30906: Added a metric for launcher's container destruction failure.
On Feb. 12, 2015, 5:41 p.m., Ben Mahler wrote: src/slave/containerizer/mesos/containerizer.hpp, line 280 https://reviews.apache.org/r/30906/diff/1/?file=861366#file861366line280 Should this just be mesos_containerizer/launcher_destroy_errors? Looking at our port_mapping/ metrics, and other subcomponent metrics, that seems like a more consistent name? Ben Mahler wrote: Or, is it possible to have this error in the launcher and just call it: launcher/destroy_errors or linux_launcher/destroy_errors? Or we can modify port_mapping metrics to be `slave/containerizer/mesos/isolators/network/port_mapping/xxx`. Maybe that's too long? - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30906/#review72190 --- On Feb. 12, 2015, 1:34 a.m., Vinod Kone wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30906/ --- (Updated Feb. 12, 2015, 1:34 a.m.) Review request for mesos, Ian Downes and Jie Yu. Bugs: MESOS-1690 https://issues.apache.org/jira/browse/MESOS-1690 Repository: mesos Description --- See summary. Diffs - src/slave/containerizer/mesos/containerizer.hpp 074a2d82dcd882e52f8cd62ed7493295596acb26 src/slave/containerizer/mesos/containerizer.cpp d5b90d12d63becfeb4c3efa9c6f5d826417f582c Diff: https://reviews.apache.org/r/30906/diff/ Testing --- make check No existing launcher tests to inject metrics test. Will work with @idownes to figure how to write one. Thanks, Vinod Kone
Review Request 30945: Waited containerizer update to finish before sending task to the executor in Slave::_runTask.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30945/ --- Review request for mesos, Ian Downes and Vinod Kone. Bugs: MESOS-998 https://issues.apache.org/jira/browse/MESOS-998 Repository: mesos Description --- Waited containerizer update to finish before sending task to the executor in Slave::_runTask. Diffs - src/slave/slave.hpp 7a399f6df50c69b7e1e12d74f076fa57b6edb1b3 src/slave/slave.cpp f39a876cdd6b580a7a75fd053e6923761bee7635 Diff: https://reviews.apache.org/r/30945/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30945: Waited containerizer update to finish before sending task to the executor in Slave::_runTask.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30945/#review72227 --- Just realized that there still a race here with killTask (killTask happens between `_runTask` and `__runTask`). We could end up with the situation where the executor receives an KillTaskMessage with an unknown task since RunTaskMessage hasn't reached the executor yet. :( - Jie Yu On Feb. 12, 2015, 7:30 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30945/ --- (Updated Feb. 12, 2015, 7:30 p.m.) Review request for mesos, Ian Downes and Vinod Kone. Bugs: MESOS-998 https://issues.apache.org/jira/browse/MESOS-998 Repository: mesos Description --- Waited containerizer update to finish before sending task to the executor in Slave::_runTask. Diffs - src/slave/slave.hpp 7a399f6df50c69b7e1e12d74f076fa57b6edb1b3 src/slave/slave.cpp f39a876cdd6b580a7a75fd053e6923761bee7635 Diff: https://reviews.apache.org/r/30945/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 29742: Added utility functions to determine types of resources.
On Feb. 3, 2015, 8:19 p.m., Vinod Kone wrote: include/mesos/resources.hpp, line 92 https://reviews.apache.org/r/29742/diff/4/?file=839661#file839661line92 We didn't do isempty above, so how about getting rid of is as prefix? I think returning a bool signals a is. Michael Park wrote: Hm, that's a good point. I'll fix that. Michael Park wrote: Fixed. Hum, actually, I prefer the name 'isEmpty', 'isPersistentVolume', 'isReserved' because there are 'reserved' and 'unreserved' functions (same name) in Resources which return reserved/unreserved resources (different purpose). - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29742/#review70817 --- On Feb. 11, 2015, 11:49 p.m., Michael Park wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29742/ --- (Updated Feb. 11, 2015, 11:49 p.m.) Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Jie Yu, and Vinod Kone. Repository: mesos Description --- # Motivation The main motivation for introducing these functions is to capture the definition of various identification of resources. With these functions capturing various definitions of concepts for us, we gain: - readability. - engineering benefits. ## Example For example, consider the concept of persistent volume. Currently we do `if (resource.has_disk() resource.has_persistence())` throughout the codebase to test to identify this type of resource. ### Readability From a readability perspective, `if (resource.has_disk() resource.has_persistence())` simply harder to read than `if (Resource::persistentVolume(resource))`. A foreign reader also can't be sure that the first predicate is checking for a persistent volume without digging deeper into the codebase. (Maybe we actually have an additional requirement for a resource to be considered a persistent volume.) ### Engineering Benefit If and when we realize that the definition needs to be updated, we shouldn't have to change the predicate every `if` statement that checks for a persistent volume. If you're thinking, just grep for `if (resource.has_disk() resource.has_persistence())`..., what if we didn't use `resource` as the variable name? what if we actually did `if (!(resource.has_disk() resource.has_persistence()))`? what about `if (!resource.has_disk() || !resource.has_persistence()))`? In general I believe this approach makes it hard to keep the definitions consistent throughout the codebase. Instead, we should consistently use the predicates that capture the definition, (e.g. `Resource::persistentVolume(resource)`) and later on if we change the definition of persistent volume, we simply update the definition of `persistentVolume` and we're done. ## Why not just have a Filter instead? Fundamentally a `Filter` is built on a **unary predicate**. Given a list of elements, we keep elements that satisfy the predicate. We *could* embed these predicates into a `Filter` and only provide those. But 1. I don't think a `Filter` is necessarily the right tool for every job. 2. Unary predicates are the basis of many algorithms (e.g. `all_of`, `any_of`, `none_of`, `count_if`, `find_if`) and therefore deserve to exist in its own right. Diffs - include/mesos/resources.hpp c7cc46e0183ea97013dd088a717da6c0e6ed5cf0 src/common/resources.cpp 98371f6873482d0cdbefeb279b58ae6cc680a88f src/master/master.hpp 6a39df04514c756415354fae66c5835ada191c52 src/master/validation.cpp acc35b25c93f2d3900d79c3070b1d681416ed66b src/slave/slave.cpp f39a876cdd6b580a7a75fd053e6923761bee7635 Diff: https://reviews.apache.org/r/29742/diff/ Testing --- make check Thanks, Michael Park
Re: Review Request 30945: Waited containerizer update to finish before sending task to the executor in Slave::_runTask.
On Feb. 12, 2015, 7:47 p.m., Jie Yu wrote: Just realized that there still a race here with killTask (killTask happens between `_runTask` and `__runTask`). We could end up with the situation where the executor receives an KillTaskMessage with an unknown task since RunTaskMessage hasn't reached the executor yet. :( Timothy Chen wrote: True if kill task is sent without any task status update yet, should we have a new state for the executor so that we don't proxy until it's fully ready? Or what's in your mind to fix this? I am thinking about using the queuedTasks because it's in fact queued (and not yet launched). - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30945/#review72227 --- On Feb. 12, 2015, 7:30 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30945/ --- (Updated Feb. 12, 2015, 7:30 p.m.) Review request for mesos, Ian Downes and Vinod Kone. Bugs: MESOS-998 https://issues.apache.org/jira/browse/MESOS-998 Repository: mesos Description --- Waited containerizer update to finish before sending task to the executor in Slave::_runTask. Diffs - src/slave/slave.hpp 7a399f6df50c69b7e1e12d74f076fa57b6edb1b3 src/slave/slave.cpp f39a876cdd6b580a7a75fd053e6923761bee7635 Diff: https://reviews.apache.org/r/30945/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30850: Added validation for checkpointed resources during slave recovery.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30850/ --- (Updated Feb. 11, 2015, 6:38 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Repository: mesos Description --- Added validation for checkpointed resources during slave recovery. This is a rebased version for https://reviews.apache.org/r/29913/ Diffs - src/slave/slave.hpp 7a399f6df50c69b7e1e12d74f076fa57b6edb1b3 src/slave/slave.cpp f39a876cdd6b580a7a75fd053e6923761bee7635 src/tests/mesos.hpp 2b0c90d166ed33065f2a4a2909c61cc5119da511 src/tests/mesos.cpp ac2dc737eff48d122cfd604d4820f423e5bc0472 src/tests/persistent_volume_tests.cpp b4f2298069233b27f6b4b8d668753810f43bb58d Diff: https://reviews.apache.org/r/30850/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30850: Added validation for checkpointed resources during slave recovery.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30850/ --- (Updated Feb. 11, 2015, 6:38 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Changes --- Review comments. Repository: mesos Description --- Added validation for checkpointed resources during slave recovery. This is a rebased version for https://reviews.apache.org/r/29913/ Diffs (updated) - src/slave/slave.hpp 7a399f6df50c69b7e1e12d74f076fa57b6edb1b3 src/slave/slave.cpp f39a876cdd6b580a7a75fd053e6923761bee7635 src/tests/mesos.hpp 2b0c90d166ed33065f2a4a2909c61cc5119da511 src/tests/mesos.cpp ac2dc737eff48d122cfd604d4820f423e5bc0472 src/tests/persistent_volume_tests.cpp b4f2298069233b27f6b4b8d668753810f43bb58d Diff: https://reviews.apache.org/r/30850/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30812: Updated the allocator in Master::addSlave using slave-totalResources.
On Feb. 11, 2015, 12:32 a.m., Ben Mahler wrote: Before you commit this, mind adding a test to ensure that the checkpointed resources are offered? (in a follow up patch) Added a test in https://reviews.apache.org/r/30900/ - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30812/#review71886 --- On Feb. 10, 2015, 1:35 a.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30812/ --- (Updated Feb. 10, 2015, 1:35 a.m.) Review request for mesos and Ben Mahler. Repository: mesos Description --- Updated the allocator in Master::addSlave using slave-totalResources. Diffs - src/master/master.cpp 7c3aa220e72fdc156fb9a0998dd68beb7c464256 Diff: https://reviews.apache.org/r/30812/diff/ Testing --- make check Thanks, Jie Yu
Review Request 30900: Added a persistent volume test for verifying master failover.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30900/ --- Review request for mesos, Ben Mahler and Vinod Kone. Repository: mesos Description --- Added a persistent volume test for verifying master failover. Followup for https://reviews.apache.org/r/30812/ Diffs - src/tests/persistent_volume_tests.cpp b4f2298069233b27f6b4b8d668753810f43bb58d Diff: https://reviews.apache.org/r/30900/diff/ Testing --- make check Thanks, Jie Yu
Review Request 30861: Consistency fix for slave logging.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30861/ --- Review request for mesos, Ben Mahler and Vinod Kone. Repository: mesos Description --- Consistency fix for slave logging. Removed `'` for framework/executor/container/task ids. Removed empty leading spaces. Diffs - src/slave/slave.cpp 8a1b12ebd0fad0c427624a8bacc47205b4311fc9 Diff: https://reviews.apache.org/r/30861/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30850: Added validation for checkpointed resources during slave recovery.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30850/ --- (Updated Feb. 10, 2015, 11:10 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Repository: mesos Description --- Added validation for checkpointed resources during slave recovery. This is a rebased version for https://reviews.apache.org/r/29913/ Diffs - src/slave/slave.hpp 35c7c8bf6eb8efee332c13713f6087c6f899d6e3 src/slave/slave.cpp 7a29f86bf873141759957a6ef5502f6a3b6ab269 src/tests/mesos.hpp 83a369968ab2403fa341829ac5d11f7243095190 src/tests/mesos.cpp 21a405366f56c963611324076efe775f85b9d9f7 src/tests/persistent_volume_tests.cpp e3a8d33b6ae2b6afe5f4ab7c029b2f37a861f112 Diff: https://reviews.apache.org/r/30850/diff/ Testing --- make check Thanks, Jie Yu
Review Request 30850: Added validation for checkpointed resources during slave recovery.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30850/ --- Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Repository: mesos Description --- Added validation for checkpointed resources during slave recovery. This is a rebased version for https://reviews.apache.org/r/29913/ Diffs - src/slave/slave.hpp 35c7c8bf6eb8efee332c13713f6087c6f899d6e3 src/slave/slave.cpp 7a29f86bf873141759957a6ef5502f6a3b6ab269 src/tests/mesos.hpp 83a369968ab2403fa341829ac5d11f7243095190 src/tests/mesos.cpp 21a405366f56c963611324076efe775f85b9d9f7 src/tests/persistent_volume_tests.cpp e3a8d33b6ae2b6afe5f4ab7c029b2f37a861f112 Diff: https://reviews.apache.org/r/30850/diff/ Testing --- make check Thanks, Jie Yu
Review Request 30812: Updated the allocator in Master::addSlave using slave-totalResources.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30812/ --- Review request for mesos and Ben Mahler. Repository: mesos Description --- Updated the allocator in Master::addSlave using slave-totalResources. Diffs - src/master/master.cpp 7c3aa220e72fdc156fb9a0998dd68beb7c464256 Diff: https://reviews.apache.org/r/30812/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 28809: Started to maintain and checkpoint persisted resource in slave.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28809/ --- (Updated Feb. 9, 2015, 7:33 p.m.) Review request for mesos and Ben Mahler. Changes --- Vinod's comments. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- Started to maintain and checkpoint persisted resource in slave. That includes: 1) responds to update resources message 2) checkpoint resources 3) recover checkpointed resources 4) send checkpointed resources during register/reregister Diffs (updated) - src/slave/slave.hpp 9adee17cb94a72f0e1e139b3fd8978a9a1ff6237 src/slave/slave.cpp fff2d725fe49eee984d9151cfb2131202c47994f src/slave/state.hpp 04084fc5c9130bf7d824b2ced4eb7053a995edd0 src/tests/persistent_volume_tests.cpp ffbaedddbd21d0eb7964be0bff4368384389b0a0 Diff: https://reviews.apache.org/r/28809/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30508: Prepare persistent volumes in slave.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30508/ --- (Updated Feb. 9, 2015, 6 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Changes --- Rebased. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- This patch prepares persistent volumes in slave when receiving new task/executor. Diffs (updated) - src/slave/slave.hpp 9adee17cb94a72f0e1e139b3fd8978a9a1ff6237 src/slave/slave.cpp fff2d725fe49eee984d9151cfb2131202c47994f Diff: https://reviews.apache.org/r/30508/diff/ Testing --- make check. Thanks, Jie Yu
Re: Review Request 28809: Started to maintain and checkpoint persisted resource in slave.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28809/ --- (Updated Feb. 9, 2015, 5:59 p.m.) Review request for mesos and Ben Mahler. Changes --- Rebased. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- Started to maintain and checkpoint persisted resource in slave. That includes: 1) responds to update resources message 2) checkpoint resources 3) recover checkpointed resources 4) send checkpointed resources during register/reregister Diffs (updated) - src/slave/slave.hpp 9adee17cb94a72f0e1e139b3fd8978a9a1ff6237 src/slave/slave.cpp fff2d725fe49eee984d9151cfb2131202c47994f src/slave/state.hpp 04084fc5c9130bf7d824b2ced4eb7053a995edd0 src/tests/persistent_volume_tests.cpp ffbaedddbd21d0eb7964be0bff4368384389b0a0 Diff: https://reviews.apache.org/r/28809/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30509: Added executor working directory to Container struct in Mesos containerizer.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30509/ --- (Updated Feb. 9, 2015, 6 p.m.) Review request for mesos, Ben Mahler, Ian Downes, Michael Park, and Vinod Kone. Changes --- Rebased. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- Added executor working directory to Container struct in Mesos containerizer. The executor directory is needed when linking persistent volumes inside the container. Diffs (updated) - src/slave/containerizer/mesos/containerizer.hpp b3aafe4efa9f3469d7a3fd39243ad66b46d6a54d src/slave/containerizer/mesos/containerizer.cpp fa40d47aee7803833bcde6cce1e86a21d7cf27d0 Diff: https://reviews.apache.org/r/30509/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30510: Allowed Mesos containerizer to prepare and update volumes.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30510/ --- (Updated Feb. 9, 2015, 6 p.m.) Review request for mesos, Ben Mahler, Ian Downes, Michael Park, and Vinod Kone. Changes --- Rebased. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- Allowed Mesos containerizer to prepare and update volumes. Mesos containerizer setup volumes for the container. Diffs (updated) - src/slave/containerizer/mesos/containerizer.hpp b3aafe4efa9f3469d7a3fd39243ad66b46d6a54d src/slave/containerizer/mesos/containerizer.cpp fa40d47aee7803833bcde6cce1e86a21d7cf27d0 Diff: https://reviews.apache.org/r/30510/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30510: Allowed Mesos containerizer to prepare and update volumes.
On Feb. 2, 2015, 9:20 p.m., Ian Downes wrote: src/slave/containerizer/mesos/containerizer.cpp, lines 1161-1167 https://reviews.apache.org/r/30510/diff/1/?file=843874#file843874line1161 How can we ensure that such a filesystem isolator is being used? Shouldn't all of this functionality be in one place!? Jie Yu wrote: For now, in master, we will only allow relative non-nested container path. Revised the comments. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30510/#review70617 --- On Feb. 9, 2015, 6 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30510/ --- (Updated Feb. 9, 2015, 6 p.m.) Review request for mesos, Ben Mahler, Ian Downes, Michael Park, and Vinod Kone. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- Allowed Mesos containerizer to prepare and update volumes. Mesos containerizer setup volumes for the container. Diffs - src/slave/containerizer/mesos/containerizer.hpp b3aafe4efa9f3469d7a3fd39243ad66b46d6a54d src/slave/containerizer/mesos/containerizer.cpp fa40d47aee7803833bcde6cce1e86a21d7cf27d0 Diff: https://reviews.apache.org/r/30510/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30510: Allowed Mesos containerizer to prepare and update volumes.
On Feb. 9, 2015, 8:24 p.m., Vinod Kone wrote: src/slave/containerizer/mesos/containerizer.cpp, line 1157 https://reviews.apache.org/r/30510/diff/3/?file=858756#file858756line1157 Why is this a CHECK? What is the guarantee that a container won't be cleaned up? Added a comment. This method is only supposed to be called when the container is alive. On Feb. 9, 2015, 8:24 p.m., Vinod Kone wrote: src/slave/containerizer/mesos/containerizer.cpp, lines 819-820 https://reviews.apache.org/r/30510/diff/3/?file=858756#file858756line819 Why pull it into the middle of 'futures' handling? Added a comment. On Feb. 9, 2015, 8:24 p.m., Vinod Kone wrote: src/slave/containerizer/mesos/containerizer.cpp, line 1181 https://reviews.apache.org/r/30510/diff/3/?file=858756#file858756line1181 Add a comment here, Done. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30510/#review71690 --- On Feb. 9, 2015, 6 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30510/ --- (Updated Feb. 9, 2015, 6 p.m.) Review request for mesos, Ben Mahler, Ian Downes, Michael Park, and Vinod Kone. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- Allowed Mesos containerizer to prepare and update volumes. Mesos containerizer setup volumes for the container. Diffs - src/slave/containerizer/mesos/containerizer.hpp b3aafe4efa9f3469d7a3fd39243ad66b46d6a54d src/slave/containerizer/mesos/containerizer.cpp fa40d47aee7803833bcde6cce1e86a21d7cf27d0 Diff: https://reviews.apache.org/r/30510/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30508: Prepare persistent volumes in slave.
On Feb. 9, 2015, 8:12 p.m., Vinod Kone wrote: src/slave/slave.cpp, line 3765 https://reviews.apache.org/r/30508/diff/2/?file=858752#file858752line3765 also, add a check for persistence? This is not needed because the if guard above should have already verified it. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30508/#review71689 --- On Feb. 9, 2015, 6 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30508/ --- (Updated Feb. 9, 2015, 6 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- This patch prepares persistent volumes in slave when receiving new task/executor. Diffs - src/slave/slave.hpp 9adee17cb94a72f0e1e139b3fd8978a9a1ff6237 src/slave/slave.cpp fff2d725fe49eee984d9151cfb2131202c47994f Diff: https://reviews.apache.org/r/30508/diff/ Testing --- make check. Thanks, Jie Yu
Re: Review Request 30508: Prepare persistent volumes in slave.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30508/ --- (Updated Feb. 9, 2015, 10:37 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Changes --- Review comments. Moved the logic to checkpointResources. Added a test. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- This patch prepares persistent volumes in slave when receiving new task/executor. Diffs (updated) - src/slave/slave.cpp 7a29f86bf873141759957a6ef5502f6a3b6ab269 src/tests/persistent_volume_tests.cpp e3a8d33b6ae2b6afe5f4ab7c029b2f37a861f112 Diff: https://reviews.apache.org/r/30508/diff/ Testing --- make check. Thanks, Jie Yu
Re: Review Request 30509: Added executor working directory to Container struct in Mesos containerizer.
On Feb. 5, 2015, 1:24 a.m., Timothy Chen wrote: src/slave/containerizer/mesos/containerizer.hpp, line 260 https://reviews.apache.org/r/30509/diff/2/?file=849046#file849046line260 I wonder if we should consolidate our terminalogy for the working directory. I've been calling it sandbox in the docker containerizer, and sometimes it's referred to workDir, and here just directory. Can we just call this sandbox? I agree that we should make it consistent. However, I have a different opinion about whether the executor working directory should be named as 'sandbox'. The main reason is that we may want to add more stuff to executor's working directory and make 'sandbox' the actual sandbox for executors/tasks. For example, the following layout: ``` executor_working_dir |--- sandbox/ // the actual sandbox for executor/tasks |--- volumes/ // volumes for the executor/tasks, will be bind mount into container under /mnt/mesos/volumes ``` Since MesosContainerizer currently uses 'directory' as the executor working directory already, I would rather keep it consistent for now. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30509/#review71118 --- On Feb. 9, 2015, 6 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30509/ --- (Updated Feb. 9, 2015, 6 p.m.) Review request for mesos, Ben Mahler, Ian Downes, Michael Park, and Vinod Kone. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- Added executor working directory to Container struct in Mesos containerizer. The executor directory is needed when linking persistent volumes inside the container. Diffs - src/slave/containerizer/mesos/containerizer.hpp b3aafe4efa9f3469d7a3fd39243ad66b46d6a54d src/slave/containerizer/mesos/containerizer.cpp fa40d47aee7803833bcde6cce1e86a21d7cf27d0 Diff: https://reviews.apache.org/r/30509/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30510: Allowed Mesos containerizer to prepare and update volumes.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30510/ --- (Updated Feb. 10, 2015, 1:09 a.m.) Review request for mesos, Ben Mahler, Ian Downes, Michael Park, and Vinod Kone. Changes --- Review comments. Added tests. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- Allowed Mesos containerizer to prepare and update volumes. Mesos containerizer setup volumes for the container. Diffs (updated) - src/slave/containerizer/mesos/containerizer.hpp b3aafe4efa9f3469d7a3fd39243ad66b46d6a54d src/slave/containerizer/mesos/containerizer.cpp fa40d47aee7803833bcde6cce1e86a21d7cf27d0 src/tests/persistent_volume_tests.cpp e3a8d33b6ae2b6afe5f4ab7c029b2f37a861f112 Diff: https://reviews.apache.org/r/30510/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30694: Fixed bugs in CREATE/DESTROY operation handlers and added tests.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30694/ --- (Updated Feb. 6, 2015, 11:31 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Changes --- Review comments. Bugs: MESOS-2100 https://issues.apache.org/jira/browse/MESOS-2100 Repository: mesos Description --- Fixed bugs in CREATE/DESTROY operation handlers and added tests. Diffs (updated) - include/mesos/resources.hpp 3b57568c10233a0c692787de6464f21af5eaadf4 src/Makefile.am 93537d17d3c7604a8532ee1453e405630c481ddc src/master/master.hpp c3c77f840f089c5754c764e7f150a3c1971d311f src/master/master.cpp 22fece79c6e511a1b61eb674d7f82216e7a25e00 src/tests/persistent_volume_tests.cpp PRE-CREATION Diff: https://reviews.apache.org/r/30694/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30694: Fixed bugs in CREATE/DESTROY operation handlers and added tests.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30694/ --- (Updated Feb. 6, 2015, 9:43 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Changes --- Review comments. Bugs: MESOS-2100 https://issues.apache.org/jira/browse/MESOS-2100 Repository: mesos Description --- Fixed bugs in CREATE/DESTROY operation handlers and added tests. Diffs (updated) - include/mesos/resources.hpp 3b57568c10233a0c692787de6464f21af5eaadf4 src/Makefile.am 93537d17d3c7604a8532ee1453e405630c481ddc src/master/master.hpp c3c77f840f089c5754c764e7f150a3c1971d311f src/master/master.cpp 22fece79c6e511a1b61eb674d7f82216e7a25e00 src/tests/persistent_volume_tests.cpp PRE-CREATION Diff: https://reviews.apache.org/r/30694/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30694: Fixed bugs in CREATE/DESTROY operation handlers and added tests.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30694/ --- (Updated Feb. 6, 2015, 7:47 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Changes --- Review comments. Bugs: MESOS-2100 https://issues.apache.org/jira/browse/MESOS-2100 Repository: mesos Description --- Fixed bugs in CREATE/DESTROY operation handlers and added tests. Diffs (updated) - include/mesos/resources.hpp 3b57568c10233a0c692787de6464f21af5eaadf4 src/Makefile.am 93537d17d3c7604a8532ee1453e405630c481ddc src/master/master.hpp dcfd38ae2fa9e1bd0b477e9719724dba37114d30 src/master/master.cpp 234bbecc4205036d790b026abd59100eb188f913 src/tests/persistent_volume_tests.cpp PRE-CREATION Diff: https://reviews.apache.org/r/30694/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30694: Fixed bugs in CREATE/DESTROY operation handlers and added tests.
On Feb. 6, 2015, 1:25 a.m., Ben Mahler wrote: src/master/master.cpp, lines 4787-4792 https://reviews.apache.org/r/30694/diff/1/?file=851240#file851240line4787 Hm.. relying on this error to determine whether the volume applies to checkpointed resources seems a bit implicit. How do we know it's not due to some other error? :) Have you considered something like the following approach: ``` // There should be an easy way to get the slave's current total resources // (checkpointed + non-checkpointed) since we likely need to use this // elsewhere too (e.g. http endpoints). Resources slaveResources; // checkpointed + non-checkopinted // Add the volumes. TryResources resources = slaveResources.apply(operation.create()); CHECK_SOME(resources); slaveResources = resources.get(); slave-checkpointedResources = slaveResources.filter(checkpointFilter); ``` This approach should work for the DESTROY case as well, thoughts? Refactored. Much better. Let me know! - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30694/#review71316 --- On Feb. 6, 2015, 7:47 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30694/ --- (Updated Feb. 6, 2015, 7:47 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Bugs: MESOS-2100 https://issues.apache.org/jira/browse/MESOS-2100 Repository: mesos Description --- Fixed bugs in CREATE/DESTROY operation handlers and added tests. Diffs - include/mesos/resources.hpp 3b57568c10233a0c692787de6464f21af5eaadf4 src/Makefile.am 93537d17d3c7604a8532ee1453e405630c481ddc src/master/master.hpp dcfd38ae2fa9e1bd0b477e9719724dba37114d30 src/master/master.cpp 234bbecc4205036d790b026abd59100eb188f913 src/tests/persistent_volume_tests.cpp PRE-CREATION Diff: https://reviews.apache.org/r/30694/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 28809: Started to maintain and checkpoint persisted resource in slave.
On Feb. 5, 2015, 7:38 p.m., Ben Mahler wrote: src/slave/slave.cpp, lines 1335-1353 https://reviews.apache.org/r/28809/diff/4/?file=848405#file848405line1335 Is it safe to do this once on the combined resources? ``` Resources resources = task.resources(); if (task.has_executor()) { resources += task.executor().resources(); } foreach (const Resource resource, resources) { ... } ``` It's safe, but I want to print different ERROR messages for task and executor. On Feb. 5, 2015, 7:38 p.m., Ben Mahler wrote: src/slave/slave.cpp, lines 3478-3479 https://reviews.apache.org/r/28809/diff/4/?file=848405#file848405line3478 It's too bad the `errors` are just unsigned ints and not a collection of `OptionError`s that we can print here. Added a TODO. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28809/#review71279 --- On Feb. 4, 2015, 7:13 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28809/ --- (Updated Feb. 4, 2015, 7:13 p.m.) Review request for mesos and Ben Mahler. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- Started to maintain and checkpoint persisted resource in slave. That includes: 1) responds to update resources message 2) checkpoint resources 3) recover checkpointed resources 4) send checkpointed resources during register/reregister Diffs - src/slave/slave.hpp 70bd8c1fde4ea09fa54c76aa93424a1adb0309f6 src/slave/slave.cpp a8b262174ab5c9a524db8318d3d1438cd75a702b Diff: https://reviews.apache.org/r/28809/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 28809: Started to maintain and checkpoint persisted resource in slave.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28809/ --- (Updated Feb. 6, 2015, 12:05 a.m.) Review request for mesos and Ben Mahler. Changes --- Addressed review comments and added tests. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- Started to maintain and checkpoint persisted resource in slave. That includes: 1) responds to update resources message 2) checkpoint resources 3) recover checkpointed resources 4) send checkpointed resources during register/reregister Diffs (updated) - src/slave/slave.hpp 70bd8c1fde4ea09fa54c76aa93424a1adb0309f6 src/slave/slave.cpp a8b262174ab5c9a524db8318d3d1438cd75a702b src/slave/state.hpp f92808ad9b1623cea0c35ec735c53a3d6457bdbe src/tests/persistent_volume_tests.cpp PRE-CREATION Diff: https://reviews.apache.org/r/28809/diff/ Testing --- make check Thanks, Jie Yu
Review Request 30682: Changed default disk quota check interval to be 15 seconds.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30682/ --- Review request for mesos and Vinod Kone. Repository: mesos Description --- Changed default disk quota check interval to be 15 seconds. Diffs - src/slave/flags.hpp facf6d3e68e412897a1381d7bbdcceaf5d1fbb94 Diff: https://reviews.apache.org/r/30682/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30624: Refactor and increase test coverage for slave paths
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30624/#review70989 --- Ship it! src/tests/paths_tests.cpp https://reviews.apache.org/r/30624/#comment116478 2 lines apart here please:) src/tests/paths_tests.cpp https://reviews.apache.org/r/30624/#comment116480 s/Pid/LibprocessPid/ Because we also have forked pid. - Jie Yu On Feb. 4, 2015, 5:43 p.m., Dominic Hamon wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30624/ --- (Updated Feb. 4, 2015, 5:43 p.m.) Review request for mesos and Jie Yu. Bugs: MESOS-2314 https://issues.apache.org/jira/browse/MESOS-2314 Repository: mesos Description --- Refactor and increase test coverage for slave paths Diffs - src/tests/paths_tests.cpp 417aa51b99d54055ad7254b4f274fb59d0794ca6 Diff: https://reviews.apache.org/r/30624/diff/ Testing --- make check Thanks, Dominic Hamon
Re: Review Request 30531: Remove strings::format and unnecessary constants from paths
On Feb. 4, 2015, 1:32 a.m., Jie Yu wrote: src/slave/paths.cpp, line 453 https://reviews.apache.org/r/30531/diff/1/?file=844643#file844643line453 const string directory Dominic Hamon wrote: the function returns a string by value. there's no improvement here using a const ref (the copy will be elided). using a plan string is much clearer i think. Jie Yu wrote: Does it save a call to the copy constructor (from the temp object to 'directory')? Or the compiler will optimize it? OK, I found this http://en.wikipedia.org/wiki/Copy_elision As long as the copy constructor of std::string does not have side effect, the copy will be elided. Look good to me. Feel free to commit. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30531/#review70885 --- On Feb. 4, 2015, 5:22 p.m., Dominic Hamon wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30531/ --- (Updated Feb. 4, 2015, 5:22 p.m.) Review request for mesos and Jie Yu. Bugs: MESOS-2314 https://issues.apache.org/jira/browse/MESOS-2314 Repository: mesos Description --- Remove strings::format and unnecessary constants from paths Diffs - src/slave/paths.hpp 4d6897f9eebcd6852f0223bae23a29729a42e750 src/slave/paths.cpp fe951abac1254748dbd5bdd81b7e50da75afad6d Diff: https://reviews.apache.org/r/30531/diff/ Testing --- Thanks, Dominic Hamon
Re: Review Request 30531: Remove strings::format and unnecessary constants from paths
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30531/#review70984 --- Ship it! src/slave/paths.cpp https://reviews.apache.org/r/30531/#comment116466 please use const reference here const string src/slave/paths.cpp https://reviews.apache.org/r/30531/#comment116468 Ditto. src/slave/paths.cpp https://reviews.apache.org/r/30531/#comment116469 Ditto. src/slave/paths.cpp https://reviews.apache.org/r/30531/#comment116470 Ditto. - Jie Yu On Feb. 4, 2015, 5:22 p.m., Dominic Hamon wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30531/ --- (Updated Feb. 4, 2015, 5:22 p.m.) Review request for mesos and Jie Yu. Bugs: MESOS-2314 https://issues.apache.org/jira/browse/MESOS-2314 Repository: mesos Description --- Remove strings::format and unnecessary constants from paths Diffs - src/slave/paths.hpp 4d6897f9eebcd6852f0223bae23a29729a42e750 src/slave/paths.cpp fe951abac1254748dbd5bdd81b7e50da75afad6d Diff: https://reviews.apache.org/r/30531/diff/ Testing --- Thanks, Dominic Hamon
Re: Review Request 30531: Remove strings::format and unnecessary constants from paths
On Feb. 4, 2015, 1:32 a.m., Jie Yu wrote: src/slave/paths.cpp, line 453 https://reviews.apache.org/r/30531/diff/1/?file=844643#file844643line453 const string directory Dominic Hamon wrote: the function returns a string by value. there's no improvement here using a const ref (the copy will be elided). using a plan string is much clearer i think. Does it save a call to the copy constructor (from the temp object to 'directory')? Or the compiler will optimize it? - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30531/#review70885 --- On Feb. 4, 2015, 5:22 p.m., Dominic Hamon wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30531/ --- (Updated Feb. 4, 2015, 5:22 p.m.) Review request for mesos and Jie Yu. Bugs: MESOS-2314 https://issues.apache.org/jira/browse/MESOS-2314 Repository: mesos Description --- Remove strings::format and unnecessary constants from paths Diffs - src/slave/paths.hpp 4d6897f9eebcd6852f0223bae23a29729a42e750 src/slave/paths.cpp fe951abac1254748dbd5bdd81b7e50da75afad6d Diff: https://reviews.apache.org/r/30531/diff/ Testing --- Thanks, Dominic Hamon
Re: Review Request 30628: Add an optional container rootfs for MesosContainerizer.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30628/#review70999 --- Ship it! - Jie Yu On Feb. 4, 2015, 6:07 p.m., Ian Downes wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30628/ --- (Updated Feb. 4, 2015, 6:07 p.m.) Review request for mesos and Jie Yu. Repository: mesos Description --- This is in preparation for supporting different root filesystems for containers using the MesosContainerizer. Diffs - src/slave/containerizer/mesos/containerizer.hpp 802988c90ac872b0cefa5e28f06e6fec98e8d032 Diff: https://reviews.apache.org/r/30628/diff/ Testing --- Thanks, Ian Downes
Re: Review Request 29918: Introduced a generic checkpoint function.
On Jan. 23, 2015, 6:44 p.m., Jie Yu wrote: src/slave/state.hpp, line 120 https://reviews.apache.org/r/29918/diff/9/?file=829596#file829596line120 One more thought: Can we introduce a third parameter for this function to let users opt out the temp + rename (we can use a default value)? This function is super critical. What if /tmp has too many links? It'll cause slave to flap. For majority of the checkpointing work, we don't need this temp + rename because we know the old file does not exist. What do you think? Michael Park wrote: But if we want to guarantee that the state of the checkpointed file is always valid, I don't think we can get away from writing to temp + renaming. For the case where the old file does not exist, we'd do something like, write directly to the file if it doesn't already exist, and delete on error? We would end up with an invalid checkpoint file if we crash while writing, or run into an error and crash before deleting the file. I'm also not sure if /tmp having too many links is the level of problem we need to worry about? It seems like a very unlikely case to me, but I'm not sure. Just my thoughts. Looks like it causes this bug: https://issues.apache.org/jira/browse/MESOS-2319 - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29918/#review69435 --- On Jan. 24, 2015, 1:21 a.m., Michael Park wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29918/ --- (Updated Jan. 24, 2015, 1:21 a.m.) Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Jie Yu, and Vinod Kone. Repository: mesos Description --- Introduced a generic checkpoint function. Diffs - src/slave/state.hpp 70777cf6ab681c29ca4df601fe47903e1dbdf41f src/slave/state.cpp a36fa53099300ee03f051b0f5eaaafe9f1da68d1 src/tests/slave_recovery_tests.cpp 809822e63b05a21418cd9297c927d656d6fd871d Diff: https://reviews.apache.org/r/29918/diff/ Testing --- make check Thanks, Michael Park
Re: Review Request 30514: Rate limited the removal of slaves failing health checks.
On Feb. 4, 2015, 8:01 p.m., Ben Mahler wrote: src/master/master.cpp, lines 197-246 https://reviews.apache.org/r/30514/diff/5/?file=847032#file847032line197 It looks like we only log when we're shutting down the slave, when there's no rate limiter? Have you considered having a single shutdown code path? This gets you a single dispatch, and a single log statement and it looks like it's less control flow? ``` void shutdown() { if (shuttingDown.isNone()) { FutureNothing future = Nothing(); if (limiter.isSome()) { LOG(INFO) Scheduling shutdown of slave slaveId due to health check timeout; future = limiter.get()-acquire(); } shuttingDown = future.onAny(defer(self(), Self::_shutdown)); ++metrics-slave_shutdowns_scheduled; } } void _shutdown() { CHECK_SOME(shuttingDown); const FutureNothing future = shuttingDown.get(); CHECK(!future.isFailed()) future.failure(); if (future.isReady()) { LOG(INFO) Shutting down slave slaveId due to health check timeout; dispatch(master, Master::shutdownSlave, slaveId, health check timed out); } else if (future.isDiscarded()) { // Shutdown was canceled. LOG(INFO) Canceling shutdown of slave slaveId since a pong is received!; ++metrics-slave_shutdowns_canceled; } shuttingDown = None(); } ``` Having both 'future' and 'shuttingDown' in 'shutdown' method looks confusing to me:) You need to comment about why you need that future. Or, I think having dup code is not a too bad thing in this case as the logic is more clear IMO. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/#review71021 --- On Feb. 4, 2015, 1:51 a.m., Vinod Kone wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/ --- (Updated Feb. 4, 2015, 1:51 a.m.) Review request for mesos, Ben Mahler, David Robinson, and Jie Yu. Bugs: MESOS-1148 https://issues.apache.org/jira/browse/MESOS-1148 Repository: mesos Description --- The algorithm is simple. All the slave observers share a rate limiter whose rate is configured via command line. When a slave times out on health check, a permit is acquired to shutdown the slave *but* the pings are continued to be sent. If a pong arrives before the permit is satisifed, the shutdown is cancelled. Diffs - src/master/flags.hpp 6c18a1af625311ef149b5877b08f63c2b12c040d src/master/master.hpp cd37ee9d3c57bcd91f08cd402ec1e4a09d9e42ee src/master/master.cpp d04b2c4041d8fe8978b877f07579a6f907903e1b src/tests/partition_tests.cpp fea78016268b007590516798eb30ff423fd0ae58 src/tests/slave_tests.cpp e7e2af63da785644f3f7e6e23607c02be962a2c6 Diff: https://reviews.apache.org/r/30514/diff/ Testing --- make check Ran the new tests 100 times Thanks, Vinod Kone
Re: Review Request 28809: Started to maintain and checkpoint persisted resource in slave.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28809/ --- (Updated Feb. 4, 2015, 7:13 p.m.) Review request for mesos and Ben Mahler. Changes --- Rebased. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- Started to maintain and checkpoint persisted resource in slave. That includes: 1) responds to update resources message 2) checkpoint resources 3) recover checkpointed resources 4) send checkpointed resources during register/reregister Diffs (updated) - src/slave/slave.hpp 70bd8c1fde4ea09fa54c76aa93424a1adb0309f6 src/slave/slave.cpp a8b262174ab5c9a524db8318d3d1438cd75a702b Diff: https://reviews.apache.org/r/28809/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30514: Rate limited the removal of slaves failing health checks.
On Feb. 4, 2015, 8:01 p.m., Ben Mahler wrote: src/master/master.cpp, lines 197-246 https://reviews.apache.org/r/30514/diff/5/?file=847032#file847032line197 It looks like we only log when we're shutting down the slave, when there's no rate limiter? Have you considered having a single shutdown code path? This gets you a single dispatch, and a single log statement and it looks like it's less control flow? ``` void shutdown() { if (shuttingDown.isNone()) { FutureNothing future = Nothing(); if (limiter.isSome()) { LOG(INFO) Scheduling shutdown of slave slaveId due to health check timeout; future = limiter.get()-acquire(); } shuttingDown = future.onAny(defer(self(), Self::_shutdown)); ++metrics-slave_shutdowns_scheduled; } } void _shutdown() { CHECK_SOME(shuttingDown); const FutureNothing future = shuttingDown.get(); CHECK(!future.isFailed()) future.failure(); if (future.isReady()) { LOG(INFO) Shutting down slave slaveId due to health check timeout; dispatch(master, Master::shutdownSlave, slaveId, health check timed out); } else if (future.isDiscarded()) { // Shutdown was canceled. LOG(INFO) Canceling shutdown of slave slaveId since a pong is received!; ++metrics-slave_shutdowns_canceled; } shuttingDown = None(); } ``` Jie Yu wrote: Having both 'future' and 'shuttingDown' in 'shutdown' method looks confusing to me:) You need to comment about why you need that future. Or, I think having dup code is not a too bad thing in this case as the logic is more clear IMO. Another option to avoid dup code is: 1) s/shutdown/scheduleShutdown 2) let 'shutdown' be the actual shutdown (sending message) - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/#review71021 --- On Feb. 4, 2015, 1:51 a.m., Vinod Kone wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/ --- (Updated Feb. 4, 2015, 1:51 a.m.) Review request for mesos, Ben Mahler, David Robinson, and Jie Yu. Bugs: MESOS-1148 https://issues.apache.org/jira/browse/MESOS-1148 Repository: mesos Description --- The algorithm is simple. All the slave observers share a rate limiter whose rate is configured via command line. When a slave times out on health check, a permit is acquired to shutdown the slave *but* the pings are continued to be sent. If a pong arrives before the permit is satisifed, the shutdown is cancelled. Diffs - src/master/flags.hpp 6c18a1af625311ef149b5877b08f63c2b12c040d src/master/master.hpp cd37ee9d3c57bcd91f08cd402ec1e4a09d9e42ee src/master/master.cpp d04b2c4041d8fe8978b877f07579a6f907903e1b src/tests/partition_tests.cpp fea78016268b007590516798eb30ff423fd0ae58 src/tests/slave_tests.cpp e7e2af63da785644f3f7e6e23607c02be962a2c6 Diff: https://reviews.apache.org/r/30514/diff/ Testing --- make check Ran the new tests 100 times Thanks, Vinod Kone
Review Request 30635: Fixed MESOS-2319 by creating the temporary file under the same base directory.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30635/ --- Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Bugs: MESOS-2319 https://issues.apache.org/jira/browse/MESOS-2319 Repository: mesos Description --- Fixed MESOS-2319 by creating the temporary file under the same base directory. Diffs - src/slave/state.hpp de631fb2c8a8d2bcbb861c438b18141ba6211024 Diff: https://reviews.apache.org/r/30635/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30514: Rate limited the removal of slaves failing health checks.
On Feb. 4, 2015, 8:01 p.m., Ben Mahler wrote: src/master/master.cpp, lines 197-246 https://reviews.apache.org/r/30514/diff/5/?file=847032#file847032line197 It looks like we only log when we're shutting down the slave, when there's no rate limiter? Have you considered having a single shutdown code path? This gets you a single dispatch, and a single log statement and it looks like it's less control flow? ``` void shutdown() { if (shuttingDown.isNone()) { FutureNothing future = Nothing(); if (limiter.isSome()) { LOG(INFO) Scheduling shutdown of slave slaveId due to health check timeout; future = limiter.get()-acquire(); } shuttingDown = future.onAny(defer(self(), Self::_shutdown)); ++metrics-slave_shutdowns_scheduled; } } void _shutdown() { CHECK_SOME(shuttingDown); const FutureNothing future = shuttingDown.get(); CHECK(!future.isFailed()) future.failure(); if (future.isReady()) { LOG(INFO) Shutting down slave slaveId due to health check timeout; dispatch(master, Master::shutdownSlave, slaveId, health check timed out); } else if (future.isDiscarded()) { // Shutdown was canceled. LOG(INFO) Canceling shutdown of slave slaveId since a pong is received!; ++metrics-slave_shutdowns_canceled; } shuttingDown = None(); } ``` Jie Yu wrote: Having both 'future' and 'shuttingDown' in 'shutdown' method looks confusing to me:) You need to comment about why you need that future. Or, I think having dup code is not a too bad thing in this case as the logic is more clear IMO. Jie Yu wrote: Another option to avoid dup code is: 1) s/shutdown/scheduleShutdown 2) let 'shutdown' be the actual shutdown (sending message) Ben Mahler wrote: Imagine it is s/future/limited/: ``` FutureNothing limited = Nothing(); if (limiter.isSome()) { limited = limiter.get()-acquire(); } shuttingDown = limited.onAny(defer(self(), Self::_shutdown)); ``` Seems pretty clear? Can you flush out the second option you mentioned in a code snippet? I didn't follow. Jie Yu wrote: Yeah, that reads much better. Vinod Kone wrote: hmm. i actually liked the explicitness of the earlier version. needed less cognitive overhead. but also liked the proposal of always calling _shutdown(). here's my proposal: ``` void shutdown() { if (shuttingDown.isSome()) { // Shutdown is already in progress. return; } if (limiter.isSome()) { // If a rate limit is specified schedule the shutdown according // to the rate limit. LOG(INFO) Scheduling shutdown of slave slaveId due to health check timeout; shuttingDown = limiter.get()-acquire(); } else { // If no rate limit is specified shutdown right away. shuttingDown = FutureNothing(Nothing()); // Ready future. } shuttingDown.get() .onAny(defer(self(), Self::_shutdown)); } ``` what do you guys think? I like the idea, but this won't compile since shuttingDown.get() returns a const ref and onAny is a non-const function ;-) - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/#review71021 --- On Feb. 4, 2015, 1:51 a.m., Vinod Kone wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/ --- (Updated Feb. 4, 2015, 1:51 a.m.) Review request for mesos, Ben Mahler, David Robinson, and Jie Yu. Bugs: MESOS-1148 https://issues.apache.org/jira/browse/MESOS-1148 Repository: mesos Description --- The algorithm is simple. All the slave observers share a rate limiter whose rate is configured via command line. When a slave times out on health check, a permit is acquired to shutdown the slave *but* the pings are continued to be sent. If a pong arrives before the permit is satisifed, the shutdown is cancelled. Diffs - src/master/flags.hpp 6c18a1af625311ef149b5877b08f63c2b12c040d src/master/master.hpp
Re: Review Request 30635: Fixed MESOS-2319 by creating the temporary file under the same base directory.
On Feb. 4, 2015, 8:31 p.m., Ben Mahler wrote: Is there a test of this function? Might be nice to add a NOTE or TODO about how to address the issue of leaving dangling files when we failover at the wrong time? Jie Yu wrote: Yes, SlaveStateTest should catch the regression. Will add a NOTE, the dangline file will most likely be GCed with the directory since we cannot recover the task/executor. In fact, this is not true for resources.info. Will add a TODO to cleanup dangling temp files. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30635/#review71035 --- On Feb. 4, 2015, 7:38 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30635/ --- (Updated Feb. 4, 2015, 7:38 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Bugs: MESOS-2319 https://issues.apache.org/jira/browse/MESOS-2319 Repository: mesos Description --- Fixed MESOS-2319 by creating the temporary file under the same base directory. Diffs - src/slave/state.hpp de631fb2c8a8d2bcbb861c438b18141ba6211024 Diff: https://reviews.apache.org/r/30635/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30386: Added support for CREATE operation in master.
On Feb. 4, 2015, 8:56 p.m., Ben Mahler wrote: src/master/master.cpp, line 1348 https://reviews.apache.org/r/30386/diff/2/?file=846897#file846897line1348 Maybe we should add a little note there that there is no direct feedback to the framework when an operation is dropped, they will find out that the operation was dropped through subsequent offers. Added a note. On Feb. 4, 2015, 8:56 p.m., Ben Mahler wrote: src/master/master.cpp, lines 4610-4613 https://reviews.apache.org/r/30386/diff/2/?file=846897#file846897line4610 Are you planning to keep this variable? Seems like every operation will require checkpointing for now, hence the name of the method being `updateCheckpointedResource`? :) Killed and use CHECK in default case. On Feb. 4, 2015, 8:56 p.m., Ben Mahler wrote: src/master/master.cpp, lines 4616-4627 https://reviews.apache.org/r/30386/diff/2/?file=846897#file846897line4616 Hm.. seems like this method should enforce that the operation type is supported (via CHECK or LOG(FATAL))? The caller is expected not to call this with anything that might get dropped in here, right? Yes, add LOG(FATAL) in the default case. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30386/#review71040 --- On Feb. 4, 2015, 1:09 a.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30386/ --- (Updated Feb. 4, 2015, 1:09 a.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Repository: mesos Description --- Added support for CREATE operation in master. Diffs - src/master/master.hpp cd37ee9d3c57bcd91f08cd402ec1e4a09d9e42ee src/master/master.cpp d04b2c4041d8fe8978b877f07579a6f907903e1b Diff: https://reviews.apache.org/r/30386/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30635: Fixed MESOS-2319 by creating the temporary file under the same base directory.
On Feb. 4, 2015, 8:31 p.m., Ben Mahler wrote: Is there a test of this function? Might be nice to add a NOTE or TODO about how to address the issue of leaving dangling files when we failover at the wrong time? Yes, SlaveStateTest should catch the regression. Will add a NOTE, the dangline file will most likely be GCed with the directory since we cannot recover the task/executor. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30635/#review71035 --- On Feb. 4, 2015, 7:38 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30635/ --- (Updated Feb. 4, 2015, 7:38 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Bugs: MESOS-2319 https://issues.apache.org/jira/browse/MESOS-2319 Repository: mesos Description --- Fixed MESOS-2319 by creating the temporary file under the same base directory. Diffs - src/slave/state.hpp de631fb2c8a8d2bcbb861c438b18141ba6211024 Diff: https://reviews.apache.org/r/30635/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30514: Rate limited the removal of slaves failing health checks.
On Feb. 4, 2015, 8:01 p.m., Ben Mahler wrote: src/master/master.cpp, lines 197-246 https://reviews.apache.org/r/30514/diff/5/?file=847032#file847032line197 It looks like we only log when we're shutting down the slave, when there's no rate limiter? Have you considered having a single shutdown code path? This gets you a single dispatch, and a single log statement and it looks like it's less control flow? ``` void shutdown() { if (shuttingDown.isNone()) { FutureNothing future = Nothing(); if (limiter.isSome()) { LOG(INFO) Scheduling shutdown of slave slaveId due to health check timeout; future = limiter.get()-acquire(); } shuttingDown = future.onAny(defer(self(), Self::_shutdown)); ++metrics-slave_shutdowns_scheduled; } } void _shutdown() { CHECK_SOME(shuttingDown); const FutureNothing future = shuttingDown.get(); CHECK(!future.isFailed()) future.failure(); if (future.isReady()) { LOG(INFO) Shutting down slave slaveId due to health check timeout; dispatch(master, Master::shutdownSlave, slaveId, health check timed out); } else if (future.isDiscarded()) { // Shutdown was canceled. LOG(INFO) Canceling shutdown of slave slaveId since a pong is received!; ++metrics-slave_shutdowns_canceled; } shuttingDown = None(); } ``` Jie Yu wrote: Having both 'future' and 'shuttingDown' in 'shutdown' method looks confusing to me:) You need to comment about why you need that future. Or, I think having dup code is not a too bad thing in this case as the logic is more clear IMO. Jie Yu wrote: Another option to avoid dup code is: 1) s/shutdown/scheduleShutdown 2) let 'shutdown' be the actual shutdown (sending message) Ben Mahler wrote: Imagine it is s/future/limited/: ``` FutureNothing limited = Nothing(); if (limiter.isSome()) { limited = limiter.get()-acquire(); } shuttingDown = limited.onAny(defer(self(), Self::_shutdown)); ``` Seems pretty clear? Can you flush out the second option you mentioned in a code snippet? I didn't follow. Yeah, that reads much better. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/#review71021 --- On Feb. 4, 2015, 1:51 a.m., Vinod Kone wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/ --- (Updated Feb. 4, 2015, 1:51 a.m.) Review request for mesos, Ben Mahler, David Robinson, and Jie Yu. Bugs: MESOS-1148 https://issues.apache.org/jira/browse/MESOS-1148 Repository: mesos Description --- The algorithm is simple. All the slave observers share a rate limiter whose rate is configured via command line. When a slave times out on health check, a permit is acquired to shutdown the slave *but* the pings are continued to be sent. If a pong arrives before the permit is satisifed, the shutdown is cancelled. Diffs - src/master/flags.hpp 6c18a1af625311ef149b5877b08f63c2b12c040d src/master/master.hpp cd37ee9d3c57bcd91f08cd402ec1e4a09d9e42ee src/master/master.cpp d04b2c4041d8fe8978b877f07579a6f907903e1b src/tests/partition_tests.cpp fea78016268b007590516798eb30ff423fd0ae58 src/tests/slave_tests.cpp e7e2af63da785644f3f7e6e23607c02be962a2c6 Diff: https://reviews.apache.org/r/30514/diff/ Testing --- make check Ran the new tests 100 times Thanks, Vinod Kone
Re: Review Request 30510: Allowed Mesos containerizer to prepare and update volumes.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30510/ --- (Updated Feb. 4, 2015, 10:57 p.m.) Review request for mesos, Ben Mahler, Ian Downes, Michael Park, and Vinod Kone. Changes --- Rebased. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- Allowed Mesos containerizer to prepare and update volumes. Mesos containerizer setup volumes for the container. Diffs (updated) - src/slave/containerizer/mesos/containerizer.hpp 569119fc591ecb118ecff51b8b19d0a1b8090cb5 src/slave/containerizer/mesos/containerizer.cpp d712278428889ebdfd598537690138329d8464f0 Diff: https://reviews.apache.org/r/30510/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30545: cgroups: added support to listen on memory pressures.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30545/#review71104 --- Per our dicussion, could you expose and reuse the existing EventListener? Here are the thoughts: 1) s/EventListener/EventListenerProcess 2) expose EventListner (the wrapper for EventListenerProcess) in the header file 3) introduce a Futureuint64_t read() function for EventListener (and EventListenerProcess) 4) The existing 'listen' function can be implemented as follows: create an EventListener, call read(), register an onAny for the returned future to delete EventListener 5) You need to be careful if any read fails/discarded. You probably want to fail all the subsequent reads if that happens (i.e., having a flag inside EventListenerProcess and return failed future for read() if the flag is set). - Jie Yu On Feb. 4, 2015, 7:50 p.m., Chi Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30545/ --- (Updated Feb. 4, 2015, 7:50 p.m.) Review request for mesos, Dominic Hamon, Ian Downes, and Jie Yu. Bugs: MESOS-2136 https://issues.apache.org/jira/browse/MESOS-2136 Repository: mesos Description --- cgroups: added support to listen on memory pressures. Diffs - src/linux/cgroups.hpp abf31df src/linux/cgroups.cpp 0b136e1 Diff: https://reviews.apache.org/r/30545/diff/ Testing --- Thanks, Chi Zhang
Re: Review Request 30642: Added validation for DESTROY operation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30642/ --- (Updated Feb. 4, 2015, 11:52 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Changes --- Updated. Bugs: MESOS-2100 https://issues.apache.org/jira/browse/MESOS-2100 Repository: mesos Description --- Added validation for DESTROY operation. Diffs (updated) - src/master/validation.hpp 81dc7eedb23ccfd61666280f6cea511cef7eb4ff src/master/validation.cpp 550782ee7f13271f050a6374a33d36133da10f03 src/tests/master_validation_tests.cpp 12773110a61c4478fa968a07a969bd3127d73e5e Diff: https://reviews.apache.org/r/30642/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30514: Rate limited the removal of slaves failing health checks.
On Feb. 4, 2015, 11:01 p.m., Ben Mahler wrote: src/master/master.cpp, lines 210-223 https://reviews.apache.org/r/30514/diff/6/?file=849030#file849030line210 The .get() seems a bit hacky to me (as Jie said it's not really const), it seems nice to set `shuttingDown` once at the end of the method to the correct future. The FutureNothing(Nothing()) also looks a bit cluttered? This was the motivation for obtaining the 'limited' or 'throttle' future first: ``` FutureNothing acquire = Nothing(); if (limiter.isSome()) { acquire = limiter.get()-acquire(); } shuttingDown = acquire.onAny(defer(self(), Self::_shutdown)); ``` Also, I'm not sure what the comments are adding over what's in the code? Thoughts? If the LOG is not a concern, another option is: ``` shuttingDown = (limiter.isSome() ? limiter.get()-acquire() : Nothing()) .onAny(defer(self(), Self::_shutdown)); ``` - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/#review71059 --- On Feb. 4, 2015, 10:43 p.m., Vinod Kone wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/ --- (Updated Feb. 4, 2015, 10:43 p.m.) Review request for mesos, Ben Mahler, David Robinson, and Jie Yu. Bugs: MESOS-1148 https://issues.apache.org/jira/browse/MESOS-1148 Repository: mesos Description --- The algorithm is simple. All the slave observers share a rate limiter whose rate is configured via command line. When a slave times out on health check, a permit is acquired to shutdown the slave *but* the pings are continued to be sent. If a pong arrives before the permit is satisifed, the shutdown is cancelled. Diffs - src/master/flags.hpp 6c18a1af625311ef149b5877b08f63c2b12c040d src/master/master.hpp cd37ee9d3c57bcd91f08cd402ec1e4a09d9e42ee src/master/master.cpp 69b945df58c173322b95aeae9cf4ed70329c4bb3 src/tests/partition_tests.cpp fea78016268b007590516798eb30ff423fd0ae58 src/tests/slave_tests.cpp e7e2af63da785644f3f7e6e23607c02be962a2c6 Diff: https://reviews.apache.org/r/30514/diff/ Testing --- make check Ran the new tests 100 times Thanks, Vinod Kone
Re: Review Request 30459: Refactored task/offer/resource valiation in master.
On Jan. 30, 2015, 10:32 p.m., Dominic Hamon wrote: src/master/master.hpp, line 518 https://reviews.apache.org/r/30459/diff/2/?file=842079#file842079line518 const Offer* ? I'll add a TODO. Right now, I need to change a bunch of funcitons to make it const. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30459/#review70429 --- On Jan. 31, 2015, 2:09 a.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30459/ --- (Updated Jan. 31, 2015, 2:09 a.m.) Review request for mesos, Ben Mahler, Dominic Hamon, and Vinod Kone. Bugs: MESOS-2305 https://issues.apache.org/jira/browse/MESOS-2305 Repository: mesos-incubating Description --- Refactored task/offer/resource valiation in master. See ticket for motivation. Diffs - src/Makefile.am 07bea1fb8f0035413f2119859e16fa4f9383f68e src/master/master.hpp 337e00aa46ea127f3667e3383d631c3fb8e22f30 src/master/master.cpp 10056861b95ed9453c971787982db7d09f09f323 src/master/validation.hpp PRE-CREATION src/master/validation.cpp PRE-CREATION Diff: https://reviews.apache.org/r/30459/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30514: Rate limited the removal of slaves failing health checks.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/#review70776 --- Ship it! src/master/master.cpp https://reviews.apache.org/r/30514/#comment116201 indent. align the `` src/tests/slave_tests.cpp https://reviews.apache.org/r/30514/#comment116202 Kill process:: prefix by using namespace process? - Jie Yu On Feb. 3, 2015, 2:39 a.m., Vinod Kone wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/ --- (Updated Feb. 3, 2015, 2:39 a.m.) Review request for mesos, Ben Mahler, David Robinson, and Jie Yu. Bugs: MESOS-1148 https://issues.apache.org/jira/browse/MESOS-1148 Repository: mesos Description --- The algorithm is simple. All the slave observers share a rate limiter whose rate is configured via command line. When a slave times out on health check, a permit is acquired to shutdown the slave *but* the pings are continued to be sent. If a pong arrives before the permit is satisifed, the shutdown is cancelled. Diffs - src/master/flags.hpp 6c18a1af625311ef149b5877b08f63c2b12c040d src/master/master.hpp 337e00aa46ea127f3667e3383d631c3fb8e22f30 src/master/master.cpp 10056861b95ed9453c971787982db7d09f09f323 src/tests/partition_tests.cpp fea78016268b007590516798eb30ff423fd0ae58 src/tests/slave_tests.cpp e7e2af63da785644f3f7e6e23607c02be962a2c6 Diff: https://reviews.apache.org/r/30514/diff/ Testing --- make check Ran the new tests 100 times Thanks, Vinod Kone
Re: Review Request 30514: Rate limited the removal of slaves failing health checks.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/#review70775 --- src/master/flags.hpp https://reviews.apache.org/r/30514/#comment116200 Good point. I am wondering should we introduce a Rate abstraction in stout: ``` Rate rate1(3, Minites(3)); Rate rate2(10, Hours(1)); ``` And the RateLimiter can take a Rate instance: ``` RateLimiter limiter(rate2); ``` And we can add parser for Rate. - Jie Yu On Feb. 3, 2015, 2:39 a.m., Vinod Kone wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/ --- (Updated Feb. 3, 2015, 2:39 a.m.) Review request for mesos, Ben Mahler, David Robinson, and Jie Yu. Bugs: MESOS-1148 https://issues.apache.org/jira/browse/MESOS-1148 Repository: mesos Description --- The algorithm is simple. All the slave observers share a rate limiter whose rate is configured via command line. When a slave times out on health check, a permit is acquired to shutdown the slave *but* the pings are continued to be sent. If a pong arrives before the permit is satisifed, the shutdown is cancelled. Diffs - src/master/flags.hpp 6c18a1af625311ef149b5877b08f63c2b12c040d src/master/master.hpp 337e00aa46ea127f3667e3383d631c3fb8e22f30 src/master/master.cpp 10056861b95ed9453c971787982db7d09f09f323 src/tests/partition_tests.cpp fea78016268b007590516798eb30ff423fd0ae58 src/tests/slave_tests.cpp e7e2af63da785644f3f7e6e23607c02be962a2c6 Diff: https://reviews.apache.org/r/30514/diff/ Testing --- make check Ran the new tests 100 times Thanks, Vinod Kone
Re: Review Request 30513: Added validation for CREATE offer operation.
On Feb. 2, 2015, 6:43 p.m., Dominic Hamon wrote: src/master/validation.cpp, line 126 https://reviews.apache.org/r/30513/diff/1/?file=843897#file843897line126 s/resource/volumes/ .. the current code reads as if you're checking that all resources are persistent volumes, which confused me. Fixed. On Feb. 2, 2015, 6:43 p.m., Dominic Hamon wrote: src/master/validation.cpp, line 544 https://reviews.apache.org/r/30513/diff/1/?file=843897#file843897line544 if this is all you need, can you pass in the persisted resources instead of a slave? might make future testing easier. Vinod Kone wrote: +1 why does this need to take a Slave* ? Fixed. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30513/#review70600 --- On Feb. 3, 2015, 11:15 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30513/ --- (Updated Feb. 3, 2015, 11:15 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Bugs: MESOS-2100 https://issues.apache.org/jira/browse/MESOS-2100 Repository: mesos Description --- Added validation for CREATE offer operation. This depends on the master validator refactor. Tests will be in a different patch. Diffs - src/Makefile.am 5b1885de132d527c75fc6431cd2d377f0cb7f242 src/master/validation.hpp 642c37564375868fae4e398bc4dc8edf2f31c8a4 src/master/validation.cpp 8804ba6cf0315a22943c25ec24ef594f1cfadf83 src/tests/master_validation_tests.cpp PRE-CREATION src/tests/resource_offers_tests.cpp 24a7eabd40b11f4bbbf30b8251561e73683eef9a Diff: https://reviews.apache.org/r/30513/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30513: Added validation for CREATE offer operation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30513/ --- (Updated Feb. 3, 2015, 11:15 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Changes --- Review comments. Added/refactored the tests. Bugs: MESOS-2100 https://issues.apache.org/jira/browse/MESOS-2100 Repository: mesos Description --- Added validation for CREATE offer operation. This depends on the master validator refactor. Tests will be in a different patch. Diffs (updated) - src/Makefile.am 5b1885de132d527c75fc6431cd2d377f0cb7f242 src/master/validation.hpp 642c37564375868fae4e398bc4dc8edf2f31c8a4 src/master/validation.cpp 8804ba6cf0315a22943c25ec24ef594f1cfadf83 src/tests/master_validation_tests.cpp PRE-CREATION src/tests/resource_offers_tests.cpp 24a7eabd40b11f4bbbf30b8251561e73683eef9a Diff: https://reviews.apache.org/r/30513/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30513: Added validation for CREATE offer operation.
On Feb. 3, 2015, 8:15 p.m., Vinod Kone wrote: src/master/validation.cpp, line 132 https://reviews.apache.org/r/30513/diff/1/?file=843897#file843897line132 ditto. It's clear that resource.name() is 'disk' in this case. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30513/#review70815 --- On Feb. 3, 2015, 11:15 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30513/ --- (Updated Feb. 3, 2015, 11:15 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Bugs: MESOS-2100 https://issues.apache.org/jira/browse/MESOS-2100 Repository: mesos Description --- Added validation for CREATE offer operation. This depends on the master validator refactor. Tests will be in a different patch. Diffs - src/Makefile.am 5b1885de132d527c75fc6431cd2d377f0cb7f242 src/master/validation.hpp 642c37564375868fae4e398bc4dc8edf2f31c8a4 src/master/validation.cpp 8804ba6cf0315a22943c25ec24ef594f1cfadf83 src/tests/master_validation_tests.cpp PRE-CREATION src/tests/resource_offers_tests.cpp 24a7eabd40b11f4bbbf30b8251561e73683eef9a Diff: https://reviews.apache.org/r/30513/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30513: Added validation for CREATE offer operation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30513/ --- (Updated Feb. 3, 2015, 11:19 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Changes --- Rebased. Bugs: MESOS-2100 https://issues.apache.org/jira/browse/MESOS-2100 Repository: mesos Description --- Added validation for CREATE offer operation. This depends on the master validator refactor. Tests will be in a different patch. Diffs (updated) - src/Makefile.am 5b1885de132d527c75fc6431cd2d377f0cb7f242 src/master/validation.hpp 642c37564375868fae4e398bc4dc8edf2f31c8a4 src/master/validation.cpp 8804ba6cf0315a22943c25ec24ef594f1cfadf83 src/tests/master_validation_tests.cpp PRE-CREATION src/tests/resource_offers_tests.cpp 24a7eabd40b11f4bbbf30b8251561e73683eef9a Diff: https://reviews.apache.org/r/30513/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30513: Added validation for CREATE offer operation.
On Feb. 3, 2015, 8:15 p.m., Vinod Kone wrote: Test? Added/refactored tests. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30513/#review70815 --- On Feb. 3, 2015, 11:15 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30513/ --- (Updated Feb. 3, 2015, 11:15 p.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Bugs: MESOS-2100 https://issues.apache.org/jira/browse/MESOS-2100 Repository: mesos Description --- Added validation for CREATE offer operation. This depends on the master validator refactor. Tests will be in a different patch. Diffs - src/Makefile.am 5b1885de132d527c75fc6431cd2d377f0cb7f242 src/master/validation.hpp 642c37564375868fae4e398bc4dc8edf2f31c8a4 src/master/validation.cpp 8804ba6cf0315a22943c25ec24ef594f1cfadf83 src/tests/master_validation_tests.cpp PRE-CREATION src/tests/resource_offers_tests.cpp 24a7eabd40b11f4bbbf30b8251561e73683eef9a Diff: https://reviews.apache.org/r/30513/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30584: Added metrics for slave shutdowns.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30584/#review70855 --- src/master/master.cpp https://reviews.apache.org/r/30584/#comment116302 To be consistent, either use shared_ptr for Metrics, or make RateLimiter a raw pointer. src/master/master.cpp https://reviews.apache.org/r/30584/#comment116303 This doesn't seem to necessary since we have master/slave_removals already (which will be incremented in removeSlave(slave) below)? - Jie Yu On Feb. 3, 2015, 10:36 p.m., Vinod Kone wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30584/ --- (Updated Feb. 3, 2015, 10:36 p.m.) Review request for mesos, Ben Mahler and Jie Yu. Bugs: MESOS-1148 https://issues.apache.org/jira/browse/MESOS-1148 Repository: mesos Description --- See summary. Diffs - src/master/master.cpp 10056861b95ed9453c971787982db7d09f09f323 src/master/metrics.hpp 6a43abc914dce24c60b5db57ee01d172c8258e82 src/master/metrics.cpp 956fe5042d7478d90d4f03059e248694ba2b95e5 Diff: https://reviews.apache.org/r/30584/diff/ Testing --- make check Thanks, Vinod Kone
Re: Review Request 30584: Added metrics for slave shutdowns.
On Feb. 3, 2015, 11:37 p.m., Jie Yu wrote: src/master/master.cpp, line 3908 https://reviews.apache.org/r/30584/diff/1/?file=846680#file846680line3908 This doesn't seem to necessary since we have master/slave_removals already (which will be incremented in removeSlave(slave) below)? Vinod Kone wrote: Yea, wanted to use another specific one for shutdowns because slaves are also removed without calling shutdown, e.g. reregisterSlave(). Although slave_shutdowns_scheduled - slave_shutdowns_canceled will give this number. So I'm ok with killing it altogether, if that's confusing. LMK. Yeah, I would suggest removing it since we can derive it from other metrics. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30584/#review70855 --- On Feb. 4, 2015, 12:24 a.m., Vinod Kone wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30584/ --- (Updated Feb. 4, 2015, 12:24 a.m.) Review request for mesos, Ben Mahler and Jie Yu. Bugs: MESOS-1148 https://issues.apache.org/jira/browse/MESOS-1148 Repository: mesos Description --- See summary. Diffs - src/master/master.hpp 337e00aa46ea127f3667e3383d631c3fb8e22f30 src/master/master.cpp 10056861b95ed9453c971787982db7d09f09f323 src/master/metrics.hpp 6a43abc914dce24c60b5db57ee01d172c8258e82 src/master/metrics.cpp 956fe5042d7478d90d4f03059e248694ba2b95e5 Diff: https://reviews.apache.org/r/30584/diff/ Testing --- make check Thanks, Vinod Kone
Re: Review Request 30386: Added support for CREATE operation in master.
On Jan. 29, 2015, 6:49 p.m., Ben Mahler wrote: src/master/master.cpp, lines 1342-1353 https://reviews.apache.org/r/30386/diff/1/?file=839423#file839423line1342 Nice :) Do you need to pass the slave here? Can you remove it for now? Killed. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30386/#review70144 --- On Jan. 29, 2015, 12:12 a.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30386/ --- (Updated Jan. 29, 2015, 12:12 a.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Repository: mesos Description --- Added support for CREATE operation in master. Diffs - src/master/master.hpp 1d342e56116ad63aade43484b6899ce26f25abfd src/master/master.cpp 54f26900ac8c63e79a1f89562a988c9a2567d209 Diff: https://reviews.apache.org/r/30386/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30386: Added support for CREATE operation in master.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30386/ --- (Updated Feb. 4, 2015, 1:09 a.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Changes --- Rebased. BenM's comments. Repository: mesos Description --- Added support for CREATE operation in master. Diffs (updated) - src/master/master.hpp cd37ee9d3c57bcd91f08cd402ec1e4a09d9e42ee src/master/master.cpp d04b2c4041d8fe8978b877f07579a6f907903e1b Diff: https://reviews.apache.org/r/30386/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30513: Added validation for CREATE offer operation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30513/ --- (Updated Feb. 4, 2015, 1:10 a.m.) Review request for mesos, Ben Mahler, Michael Park, and Vinod Kone. Changes --- Killed another unneeded test. Bugs: MESOS-2100 https://issues.apache.org/jira/browse/MESOS-2100 Repository: mesos Description --- Added validation for CREATE offer operation. This depends on the master validator refactor. Tests will be in a different patch. Diffs (updated) - src/Makefile.am 5b1885de132d527c75fc6431cd2d377f0cb7f242 src/master/validation.hpp 642c37564375868fae4e398bc4dc8edf2f31c8a4 src/master/validation.cpp 8804ba6cf0315a22943c25ec24ef594f1cfadf83 src/tests/master_validation_tests.cpp PRE-CREATION src/tests/resource_offers_tests.cpp 24a7eabd40b11f4bbbf30b8251561e73683eef9a Diff: https://reviews.apache.org/r/30513/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30531: Remove strings::format and unnecessary constants from paths
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30531/#review70885 --- Can we seperate the tests changes in this patch? It is a good practice not to change tests for refactors (if not necessary) so that we can capture potentiall regressions and have more confidence about the refactor? The rest looks good to me! Thanks for the cleanup! src/slave/paths.cpp https://reviews.apache.org/r/30531/#comment116331 const string directory src/slave/paths.cpp https://reviews.apache.org/r/30531/#comment116332 const string last src/slave/paths.cpp https://reviews.apache.org/r/30531/#comment116333 ditto. src/slave/paths.cpp https://reviews.apache.org/r/30531/#comment116334 ditto. src/tests/paths_tests.cpp https://reviews.apache.org/r/30531/#comment116339 Two lines between top level functions. Here and everywhere else. src/tests/paths_tests.cpp https://reviews.apache.org/r/30531/#comment116335 const string src/tests/paths_tests.cpp https://reviews.apache.org/r/30531/#comment116337 `{` should be in next line. Here and everywhere else. src/tests/paths_tests.cpp https://reviews.apache.org/r/30531/#comment116340 const string src/tests/paths_tests.cpp https://reviews.apache.org/r/30531/#comment116346 const string src/tests/paths_tests.cpp https://reviews.apache.org/r/30531/#comment116342 The format here is a little jagged, how about ``` EXPECT_EQ( path::join( executorsRoot, executorId.value(), runs, containerId.value()), paths::getExecutorRunPath( rootDir, slaveId, frameworkId, executorId, containerId)); ``` src/tests/paths_tests.cpp https://reviews.apache.org/r/30531/#comment116344 Ditto here on the jaggedness. src/tests/paths_tests.cpp https://reviews.apache.org/r/30531/#comment116345 const string - Jie Yu On Feb. 2, 2015, 11:40 p.m., Dominic Hamon wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30531/ --- (Updated Feb. 2, 2015, 11:40 p.m.) Review request for mesos and Jie Yu. Bugs: MESOS-2314 https://issues.apache.org/jira/browse/MESOS-2314 Repository: mesos Description --- Remove strings::format and unnecessary constants from paths Diffs - src/slave/paths.hpp 4d6897f9eebcd6852f0223bae23a29729a42e750 src/slave/paths.cpp fe951abac1254748dbd5bdd81b7e50da75afad6d src/tests/paths_tests.cpp 417aa51b99d54055ad7254b4f274fb59d0794ca6 Diff: https://reviews.apache.org/r/30531/diff/ Testing --- Thanks, Dominic Hamon
Re: Review Request 30509: Added executor working directory to Container struct in Mesos containerizer.
On Feb. 2, 2015, 7:56 p.m., Ian Downes wrote: Is this the executor work directory from the host's perspective or from the container's perspective? They will be different if the container has a different root filesystem... It's the host executor working directory. This is the 'directory' passed to the Containerizer (e.g., launch) - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30509/#review70614 --- On Feb. 2, 2015, 6:37 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30509/ --- (Updated Feb. 2, 2015, 6:37 p.m.) Review request for mesos, Ben Mahler, Ian Downes, Michael Park, and Vinod Kone. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- Added executor working directory to Container struct in Mesos containerizer. The executor directory is needed when linking persistent volumes inside the container. Diffs - src/slave/containerizer/mesos/containerizer.hpp 802988c90ac872b0cefa5e28f06e6fec98e8d032 src/slave/containerizer/mesos/containerizer.cpp d712278428889ebdfd598537690138329d8464f0 Diff: https://reviews.apache.org/r/30509/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30514: Rate limited the removal of slaves failing health checks.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/#review70603 --- src/master/flags.hpp https://reviews.apache.org/r/30514/#comment115811 The maximum rate? Does it make sense to you to rename the flag name as well? slave_removal_rate_limit? src/master/master.hpp https://reviews.apache.org/r/30514/#comment115816 The moment you pass it to the SlaveObserver, the ownership of the pointer is transferred and you cannot access to that anymore. I guess here you really don't have many options but using a raw pointer (and let the slave manages its lifecycle). I guess we did the same thing for Registrar, WhitelistWatcher, Allocator, etc. src/master/master.cpp https://reviews.apache.org/r/30514/#comment115817 See my above comments. You probably want to use the raw pointer and add a NOTE about the ownership is controlled by the master. OptionRateLimiter* _limiter src/master/master.cpp https://reviews.apache.org/r/30514/#comment115849 Can we simply the timeout logic and move all the limiter related stuff to `shutdown` function? ``` void timeout() { if (pinged ++timeouts = XXX) { // Schedule a shutdown of the slave. shutdown(); } // NOTE: We keep sending PINGs even if a shutdown has // been scheduled so that we can cancel the shutdown // if the slave becomes reachable again. ping(); } void shutdown() { if (limiter.isNone()) { LOG(INFO) ...; dispatch(master, Master::shutdownSlave, ...); return; } if (shuttingDown.isSome()) { // A shutdown has been scheduled already. return; } shuttingDown = limiter.get()-acquire() .onAny(defer(self(), Self::_shutdown, ...); } void _shutdown(const FutureNothing future) { if (future.isReady()) { // Shutdown the slave. } else if (future.isFailed()) { // Not expected to happen? } else if (future.isDiscarded()) { // Do nothing here, maybe print a message. } shuttingDown = None(); } ``` src/master/master.cpp https://reviews.apache.org/r/30514/#comment115842 Could you add a NOTE about this function: ``` NOTE: This function schedules a shutdown of the slave. The shutdown of a slave is rate limited (MESOS-). The shutdown can be cancelled if a pong is received before the actually shutdown is called. ``` src/tests/slave_tests.cpp https://reviews.apache.org/r/30514/#comment115852 indent? src/tests/slave_tests.cpp https://reviews.apache.org/r/30514/#comment115851 indent? - Jie Yu On Feb. 2, 2015, 6:51 p.m., Vinod Kone wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/ --- (Updated Feb. 2, 2015, 6:51 p.m.) Review request for mesos, Ben Mahler, David Robinson, and Jie Yu. Bugs: MESOS-1148 https://issues.apache.org/jira/browse/MESOS-1148 Repository: mesos Description --- The algorithm is simple. All the slave observers share a rate limiter whose rate is configured via command line. When a slave times out on health check, a permit is acquired to shutdown the slave *but* the pings are continued to be sent. If a pong arrives before the permit is satisifed, the shutdown is cancelled. Diffs - src/master/flags.hpp 6c18a1af625311ef149b5877b08f63c2b12c040d src/master/master.hpp 337e00aa46ea127f3667e3383d631c3fb8e22f30 src/master/master.cpp 10056861b95ed9453c971787982db7d09f09f323 src/tests/partition_tests.cpp fea78016268b007590516798eb30ff423fd0ae58 src/tests/slave_tests.cpp e7e2af63da785644f3f7e6e23607c02be962a2c6 Diff: https://reviews.apache.org/r/30514/diff/ Testing --- make check Ran the new tests 100 times Thanks, Vinod Kone
Re: Review Request 30512: Fixed DiskUsageCollectorTest.SymbolicLink so that it works on both ext3 and ext4.
On Feb. 2, 2015, 7:46 p.m., Vinod Kone wrote: Can you briefly mention what the fix is, in the description? Added. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30512/#review70609 --- On Feb. 2, 2015, 7:48 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30512/ --- (Updated Feb. 2, 2015, 7:48 p.m.) Review request for mesos, Ben Mahler and Vinod Kone. Bugs: MESOS-2241 https://issues.apache.org/jira/browse/MESOS-2241 Repository: mesos Description --- Fixed DiskUsageCollectorTest.SymbolicLink so that it works on both ext3 and ext4. The fix is to change the file size to be large enough so that the slight discrepency between ext3 and ext4 (for meta data) shouldn't affect the expections. Diffs - src/tests/disk_quota_tests.cpp 83a98447644a5fa46bffdc7f2ed73bb8411841f5 Diff: https://reviews.apache.org/r/30512/diff/ Testing --- make check on both ext3 and ext4 systems. Thanks, Jie Yu
Re: Review Request 30512: Fixed DiskUsageCollectorTest.SymbolicLink so that it works on both ext3 and ext4.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30512/ --- (Updated Feb. 2, 2015, 7:48 p.m.) Review request for mesos, Ben Mahler and Vinod Kone. Bugs: MESOS-2241 https://issues.apache.org/jira/browse/MESOS-2241 Repository: mesos Description (updated) --- Fixed DiskUsageCollectorTest.SymbolicLink so that it works on both ext3 and ext4. The fix is to change the file size to be large enough so that the slight discrepency between ext3 and ext4 (for meta data) shouldn't affect the expections. Diffs - src/tests/disk_quota_tests.cpp 83a98447644a5fa46bffdc7f2ed73bb8411841f5 Diff: https://reviews.apache.org/r/30512/diff/ Testing --- make check on both ext3 and ext4 systems. Thanks, Jie Yu
Re: Review Request 30509: Added executor working directory to Container struct in Mesos containerizer.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30509/ --- (Updated Feb. 2, 2015, 8:06 p.m.) Review request for mesos, Ben Mahler, Ian Downes, Michael Park, and Vinod Kone. Changes --- removed circular dependency in depends on field -- @vinodkone Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- Added executor working directory to Container struct in Mesos containerizer. The executor directory is needed when linking persistent volumes inside the container. Diffs - src/slave/containerizer/mesos/containerizer.hpp 802988c90ac872b0cefa5e28f06e6fec98e8d032 src/slave/containerizer/mesos/containerizer.cpp d712278428889ebdfd598537690138329d8464f0 Diff: https://reviews.apache.org/r/30509/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 28608: Allowed slave to create and link persistent volumes.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28608/#review70645 --- Replaced by https://reviews.apache.org/r/30510/ (do that in the containerizer). - Jie Yu On Dec. 2, 2014, 10:01 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28608/ --- (Updated Dec. 2, 2014, 10:01 p.m.) Review request for mesos and Ben Mahler. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- Allowed slave to create and link persistent volumes. Diffs - src/slave/slave.hpp 70bd8c1fde4ea09fa54c76aa93424a1adb0309f6 src/slave/slave.cpp ed63dedbda0bf548a95de7d39002ac56a29303e5 Diff: https://reviews.apache.org/r/28608/diff/ Testing --- make check Thanks, Jie Yu
Re: Review Request 30510: Allowed Mesos containerizer to prepare and update volumes.
On Feb. 2, 2015, 9:20 p.m., Ian Downes wrote: src/slave/containerizer/mesos/containerizer.hpp, lines 221-226 https://reviews.apache.org/r/30510/diff/1/?file=843873#file843873line221 In my current working code I've added an Optionstring rootfs to struct Container. Would that be sufficient? Sounds good to me! Can you pull this change out and we can commit that first? On Feb. 2, 2015, 9:20 p.m., Ian Downes wrote: src/slave/containerizer/mesos/containerizer.cpp, line 1196 https://reviews.apache.org/r/30510/diff/1/?file=843874#file843874line1196 There ordering considerations here for container paths which have common ancestory beneath the container. this won't work: /bar - container/foo/bar # fails because foo doesn't yet exist /foo - container/foo this works: /foo - container/foo /bar - container/foo/bar # succeeds, symlink is created inside /foo The filesystem/shared isolator explicitly chose to not implement re-ordering but it does check whether this is attempted and returns an error stating such. I'll disable all nested symlinks in master. I'll add a NOTE here. On Feb. 2, 2015, 9:20 p.m., Ian Downes wrote: src/slave/containerizer/mesos/containerizer.cpp, line 1213 https://reviews.apache.org/r/30510/diff/1/?file=843874#file843874line1213 This requires the parent of the joined target to exist? ln -s /persistent1 /container/foo/persistent will fail if /container/foo doesn't exist I'll disable nested symlinks. I'll add a NOTE here too. On Feb. 2, 2015, 9:20 p.m., Ian Downes wrote: src/slave/containerizer/mesos/containerizer.cpp, line 1215 https://reviews.apache.org/r/30510/diff/1/?file=843874#file843874line1215 I presume this always returns an absolute path? Yes, it is (if not, we need to make sure it is). On Feb. 2, 2015, 9:20 p.m., Ian Downes wrote: src/slave/containerizer/mesos/containerizer.cpp, lines 1161-1167 https://reviews.apache.org/r/30510/diff/1/?file=843874#file843874line1161 How can we ensure that such a filesystem isolator is being used? Shouldn't all of this functionality be in one place!? For now, in master, we will only allow relative non-nested container path. - Jie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30510/#review70617 --- On Feb. 2, 2015, 6:36 p.m., Jie Yu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30510/ --- (Updated Feb. 2, 2015, 6:36 p.m.) Review request for mesos, Ben Mahler, Ian Downes, Michael Park, and Vinod Kone. Bugs: MESOS-2031 https://issues.apache.org/jira/browse/MESOS-2031 Repository: mesos Description --- Allowed Mesos containerizer to prepare and update volumes. Mesos containerizer setup volumes for the container. Diffs - src/slave/containerizer/mesos/containerizer.hpp 802988c90ac872b0cefa5e28f06e6fec98e8d032 src/slave/containerizer/mesos/containerizer.cpp d712278428889ebdfd598537690138329d8464f0 Diff: https://reviews.apache.org/r/30510/diff/ Testing --- make check Thanks, Jie Yu
Review Request 30536: Renamed persisted resources to checkpointed resources.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30536/ --- Review request for mesos, Benjamin Hindman, Ben Mahler, Michael Park, and Vinod Kone. Repository: mesos Description --- Renamed persisted resources to checkpointed resources per our discussion. This is to avoid confusion with persistent volumes. Diffs - src/master/allocator.hpp 318a756e0a8ca1bba6d1144c8160ff24a6b6e646 src/master/master.hpp 337e00aa46ea127f3667e3383d631c3fb8e22f30 src/master/master.cpp 10056861b95ed9453c971787982db7d09f09f323 src/messages/messages.proto c609f500e5e999a7587feb8acfb62aa637824c0a Diff: https://reviews.apache.org/r/30536/diff/ Testing --- make check Thanks, Jie Yu