-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13646/
-----------------------------------------------------------
Review request for mesos and Benjamin Hindman.
Bugs: mesos-600
https://issues.apache.org/jira/browse/mesos-600
Repository: mesos-git
Description
-------
First pass of new isolation api. See description at MESOS-600.
Apologies but this was initially submitted to Twitter's internal reviewboard.
I've copied all the comments below:
[~bmahle
Awesome to see this coming together!
It looks like the changes to libprocess-ify the isolator is included here. I
commented on that a bit but it's fairly orthogonal to the isolator API changes
needed for pluggable resource isolation.
(1) As an example, let's say one wants to provide optional PID namespace
isolation. How would they accomplish that using this API? Would that be done
through a {PID,Process}ResourceIsolator? What about other namespaces?
(2) As a note, let's also try to make sure we don't dig ourselves into a hole
when it comes to having per-executor isolation schemes. This means for example,
using CFS+shares for one executor, and using only shares for another executor.
Simply a note in the code as to how this could be done would be nice. I hinted
at this feature (making resources first class) in
https://issues.apache.org/jira/browse/MESOS-338 to enable per-executor
isolation levels.
Ian Downes 1 week, 2 days ago (Aug. 9, 2013, 7:30 p.m.)
I've libprocess-ified the existing Isolator and will post a review shortly.
1) Yes, that's what we're thinking. Then you'd have things like a
MountResourceIsolator or UidResourceIsolator.
2) I'd like to talk through the per-executor isolation more. I was thinking
this could be exposed (to the isolator) as different resource types - e.g. 4.0
cpu shares or 2.0 cfs cpu and then a combined cpu isolator would be able to
coordinate both.
Add comment
src/slave/api_refactor/isolator.hpp (Diff revision 1)
public:
26
virtual process::Future<Exited> > launch(
I think this should return a future for when the executor starts rather than
exits.
Ian Downes 1 week, 2 days ago (Aug. 9, 2013, 7:30 p.m.)
Makes sense!
Ian Downes 1 week, 2 days ago (Aug. 9, 2013, 11:46 p.m.)
<<[~czhang]
A different thought on this:
Looking at cgroup_isolator.cpp, dispatch calls back to slave are either
executorStarted or executorTerminated. One of it can be eliminated by launch()
returning a future that gets fulfilled when an executor starts or terminates.
(Not an expert in libprocess, Is there a way for it to prepare 2 futures to be
fulfilled later?)
Do we have any other place other than launch() to return a future of when an
executor goes away? If not, launch() could be used for that. executorStarted
still checks the state of the executor anyways, so it shouldn't be so hard to
move logic in executorStarted to before the executor gets started.
[~czhang]
What about if launch() returned a Started future, kill() returns a Terminated
future and we add wait() which also returns a Terminated future and covers the
executor forking but dying unexpectedly.
Ian Downes 1 week, 2 days ago (Aug. 10, 2013, 12:05 a.m.)
Back to Chi's proposal, can launch() return a Started future which contains a
nested Terminated future?
Eric Biederman 1 week, 2 days ago (Aug. 10, 2013, 12:19 a.m.)
I know I tried that and failed miserably. So at the very least I was missing
the secret sauce to make it work. (Perhaps shared_ptr in the right places?)
At this point not knowing when the process actually starts isn't particularly
interesting as we can start the timer to timeout on registering the executor
before we call launch, and remove an entire state from the state machine in
slave.cpp.
Benjamin Mahler 1 week, 2 days ago (Aug. 10, 2013, 2:07 a.m.)
Chi: Unfortunately we don't have a notion of this at the moment, one could
return a nested Future as Ian said, but it's a bit clunky.
Let's focus on the ResourceIsolator changes in this review, making the Isolator
return Futures is an orthogonal change that we can focus on with Ian's upcoming
refactor.
Add comment
src/slave/api_refactor/isolator.hpp (Diff revision 1)
public:
35
// Terminate a framework's executor. Resources are NOT automatically released
36
// and should be explicitly released/reduced with change().
This sounds tricky, what was the motivation for requiring explicit releasing of
resources?
Ian Downes 1 week, 2 days ago (Aug. 9, 2013, 7:30 p.m.)
Motivated because we want to keep at least one resource (disk -
directories/lvm/raw) around after an executor finishes.
Add comment
src/slave/api_refactor/isolator.hpp (Diff revision 1)
public:
45
virtual process::Future<Resources> >
46
change(const UUID& uuid, const Resources& resources) = 0;
Why use the UUID rather than frameworkId / executorId? I think we'd like to
have a consistent way to refer to executors instead. Ditto below and in
ResourceIsolator.
Ian Downes 1 week, 2 days ago (Aug. 9, 2013, 7:30 p.m.)
This ties in with resources persisting after an executor terminates. I was
thinking the uuid could be used to differentiate.
Add comment
src/slave/api_refactor/isolator.hpp (Diff revision 1)
public:
51
// Get the resource allocation for an executor.
52
virtual process::Future<Resources> allocation(const UUID& uuid) = 0;
53
54
// Get available resources for this isolator.
55
virtual process::Future<Resources> totalUsage() = 0;
56
57
// Get total resources managed by this isolator.
58
virtual process::Future<Resources> totalAllocation() = 0;
Is there an anticipated need to add these? Just curious how you anticipate
these will be used.
Ian Downes 1 week, 2 days ago (Aug. 9, 2013, 7:30 p.m.)
No, not a strong need and they could be future additions to the api. The
motivation was the possibility of pushing more responsibility for managing
resources down into the isolator and out of the slave proper.
Add comment
src/slave/api_refactor/isolators/resource_isolator.hpp (Diff revision 1)
public:
14
// Assign pid to this resource allocation, call in the forked context
We likely do not want to call this in the forked context, was there a need for
that?
Ian Downes 1 week, 2 days ago (Aug. 9, 2013, 7:30 p.m.)
This was motivated from discussions with Eric. Cgroups and most namespaces
support moving pids into a cgroup but some namespaces require processes to put
*themselves* into a namespace.
Eric Biederman 1 week, 2 days ago (Aug. 10, 2013, 12:15 a.m.)
Just to emphasize.
Moving processes between cgroups from the is a bit iffy, especially from the
outside.
Namespaces you have to set either at fork time or from the process itself.
So we can probably kill the pid_t argument as it will always be ::getpid().
Add comment
src/slave/api_refactor/isolators/resource_isolator.hpp (Diff revision 1)
public:
20
// Change the resource, allocating/deallocating them as necessary
21
virtual Try<Nothing> change(const UUID& uuid, const Resource& resource) = 0;
My vote would be to "update" rather than "change".
Add comment
Diffs
-----
src/slave/api_refactor/isolator.hpp PRE-CREATION
src/slave/api_refactor/isolators/dir_disk_isolator.hpp PRE-CREATION
src/slave/api_refactor/isolators/lvm_disk_isolator.hpp PRE-CREATION
src/slave/api_refactor/isolators/resource_isolator.hpp PRE-CREATION
Diff: https://reviews.apache.org/r/13646/diff/
Testing
-------
Thanks,
Ian Downes