Review Request 13646: First pass of new isolation api

Ian Downes Mon, 19 Aug 2013 09:41:07 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13646/
-----------------------------------------------------------


Review request for mesos and Benjamin Hindman.


Bugs: mesos-600
    https://issues.apache.org/jira/browse/mesos-600


Repository: mesos-git


Description
-------

First pass of new isolation api. See description at MESOS-600.

Apologies but this was initially submitted to Twitter's internal reviewboard. 
I've copied all the comments below:

[~bmahle
Awesome to see this coming together!

It looks like the changes to libprocess-ify the isolator is included here. I 
commented on that a bit but it's fairly orthogonal to the isolator API changes 
needed for pluggable resource isolation.

(1) As an example, let's say one wants to provide optional PID namespace 
isolation. How would they accomplish that using this API? Would that be done 
through a {PID,Process}ResourceIsolator? What about other namespaces?

(2) As a note, let's also try to make sure we don't dig ourselves into a hole 
when it comes to having per-executor isolation schemes. This means for example, 
using CFS+shares for one executor, and using only shares for another executor. 
Simply a note in the code as to how this could be done would be nice. I hinted 
at this feature (making resources first class) in 
https://issues.apache.org/jira/browse/MESOS-338 to enable per-executor 
isolation levels.
Ian Downes 1 week, 2 days ago (Aug. 9, 2013, 7:30 p.m.)
I've libprocess-ified the existing Isolator and will post a review shortly.

1) Yes, that's what we're thinking. Then you'd have things like a 
MountResourceIsolator or UidResourceIsolator.

2) I'd like to talk through the per-executor isolation more. I was thinking 
this could be exposed (to the isolator) as different resource types - e.g. 4.0 
cpu shares or 2.0 cfs cpu and then a combined cpu isolator would be able to 
coordinate both.
Add comment
src/slave/api_refactor/isolator.hpp (Diff revision 1)
public:
26      
  virtual process::Future<Exited> > launch(
I think this should return a future for when the executor starts rather than 
exits.
Ian Downes 1 week, 2 days ago (Aug. 9, 2013, 7:30 p.m.)
Makes sense! 
Ian Downes 1 week, 2 days ago (Aug. 9, 2013, 11:46 p.m.)
<<[~czhang]
A different thought on this:

Looking at cgroup_isolator.cpp, dispatch calls back to slave are either 
executorStarted or executorTerminated. One of it can be eliminated by launch() 
returning a future that gets fulfilled when an executor starts or terminates. 
(Not an expert in libprocess, Is there a way for it to prepare 2 futures to be 
fulfilled later?) 

Do we have any other place other than launch() to return a future of when an 
executor goes away? If not, launch() could be used for that. executorStarted 
still checks the state of the executor anyways, so it shouldn't be so hard to 
move logic in executorStarted to before the executor gets started. 
[~czhang]

What about if launch() returned a Started future, kill() returns a Terminated 
future and we add wait() which also returns a Terminated future and covers the 
executor forking but dying unexpectedly.
Ian Downes 1 week, 2 days ago (Aug. 10, 2013, 12:05 a.m.)
Back to Chi's proposal, can launch() return a Started future which contains a 
nested Terminated future? 
Eric Biederman 1 week, 2 days ago (Aug. 10, 2013, 12:19 a.m.)
I know I tried that and failed miserably.  So at the very least I was missing 
the secret sauce to make it work. (Perhaps shared_ptr in the right places?)

At this point not knowing when the process actually starts isn't particularly 
interesting as we can start the timer to timeout on registering the executor 
before we call launch, and remove an entire state from the state machine in 
slave.cpp.
Benjamin Mahler 1 week, 2 days ago (Aug. 10, 2013, 2:07 a.m.)
Chi: Unfortunately we don't have a notion of this at the moment, one could 
return a nested Future as Ian said, but it's a bit clunky.

Let's focus on the ResourceIsolator changes in this review, making the Isolator 
return Futures is an orthogonal change that we can focus on with Ian's upcoming 
refactor.
Add comment
src/slave/api_refactor/isolator.hpp (Diff revision 1)
public:
35      
  // Terminate a framework's executor. Resources are NOT automatically released
36      
  // and should be explicitly released/reduced with change().
This sounds tricky, what was the motivation for requiring explicit releasing of 
resources?
Ian Downes 1 week, 2 days ago (Aug. 9, 2013, 7:30 p.m.)
Motivated because we want to keep at least one resource (disk - 
directories/lvm/raw) around after an executor finishes.
Add comment
src/slave/api_refactor/isolator.hpp (Diff revision 1)
public:
45      
  virtual process::Future<Resources> >
46      
    change(const UUID& uuid, const Resources& resources) = 0;
Why use the UUID rather than frameworkId / executorId? I think we'd like to 
have a consistent way to refer to executors instead. Ditto below and in 
ResourceIsolator.
Ian Downes 1 week, 2 days ago (Aug. 9, 2013, 7:30 p.m.)
This ties in with resources persisting after an executor terminates. I was 
thinking the uuid could be used to differentiate.
Add comment
src/slave/api_refactor/isolator.hpp (Diff revision 1)
public:
51      
  // Get the resource allocation for an executor.
52      
  virtual process::Future<Resources> allocation(const UUID& uuid) = 0;
53      
54      
  // Get available resources for this isolator.
55      
  virtual process::Future<Resources> totalUsage() = 0;
56      
57      
  // Get total resources managed by this isolator.
58      
  virtual process::Future<Resources> totalAllocation() = 0;
Is there an anticipated need to add these? Just curious how you anticipate 
these will be used.
Ian Downes 1 week, 2 days ago (Aug. 9, 2013, 7:30 p.m.)
No, not a strong need and they could be future additions to the api. The 
motivation was the possibility of pushing more responsibility for managing 
resources down into the isolator and out of the slave proper.
Add comment
src/slave/api_refactor/isolators/resource_isolator.hpp (Diff revision 1)
public:
14      
  // Assign pid to this resource allocation, call in the forked context
We likely do not want to call this in the forked context, was there a need for 
that?
Ian Downes 1 week, 2 days ago (Aug. 9, 2013, 7:30 p.m.)
This was motivated from discussions with Eric. Cgroups and most namespaces 
support moving pids into a cgroup but some namespaces require processes to put 
*themselves* into a namespace.
Eric Biederman 1 week, 2 days ago (Aug. 10, 2013, 12:15 a.m.)
Just to emphasize.

Moving processes between cgroups from the is a bit iffy, especially from the 
outside.

Namespaces you have to set either at fork time or from the process itself.

So we can probably kill the pid_t argument as it will always be ::getpid().
Add comment
src/slave/api_refactor/isolators/resource_isolator.hpp (Diff revision 1)
public:
20      
  // Change the resource, allocating/deallocating them as necessary
21      
  virtual Try<Nothing> change(const UUID& uuid, const Resource& resource) = 0;
My vote would be to "update" rather than "change".
Add comment


Diffs
-----

  src/slave/api_refactor/isolator.hpp PRE-CREATION 
  src/slave/api_refactor/isolators/dir_disk_isolator.hpp PRE-CREATION 
  src/slave/api_refactor/isolators/lvm_disk_isolator.hpp PRE-CREATION 
  src/slave/api_refactor/isolators/resource_isolator.hpp PRE-CREATION 

Diff: https://reviews.apache.org/r/13646/diff/


Testing
-------


Thanks,

Ian Downes

Review Request 13646: First pass of new isolation api

Reply via email to