I read over the docs, it looks like a good start. Personally I don't see much of a benefit for dynamically reserved cpu/mem, but I'm excited about the possibility of building off this for dynamically reserved persistent volumes.
I would like to see more detail on how a reservation "times out", and the configuration options per job around that, as I feel like its the most complicated part of all of this. Ideally there would also be hooks into the host maintenance APIs here. I also didn't see any mention of it, but I believe mesos requires the framework to reserve resources with a role. By default aurora runs as the special "*" role, does this mean aurora will need to have a role specified now for this to work? Or does mesos allow reserving resources without a role? On Thu, Mar 2, 2017 at 8:35 AM, Erb, Stephan <stephan....@blue-yonder.com> wrote: > Hi everyone, > > There have been two documents on Dynamic Reservations as a first step > towards persistent services: > > · RFC: https://docs.google.com/document/d/ > 15n29HSQPXuFrnxZAgfVINTRP1Iv47_jfcstJNuMwr5A/edit#heading=h.hcsc8tda08vy > > · Technical Design Doc: https://docs.google.com/document/d/ > 1L2EKEcKKBPmuxRviSUebyuqiNwaO-2hsITBjt3SgWvE/edit#heading=h.klg3urfbnq3v > > Since a couple of days there are also now two patches online for a MVP by > Dmitriy: > > · https://reviews.apache.org/r/56690/ > > · https://reviews.apache.org/r/56691/ > > From reading the documents, I am under the impression that there is a > rough consensus on the following points: > > · We want dynamic reservations. Our general goal is to enable the > re-scheduling of tasks on the same host they used in a previous run. > > · Dynamic reservations are a best-effort feature. If in doubt, a > task will be scheduled somewhere else. > > · Jobs opt into reserved resources using an appropriate tier > config. > > · The tier config in supposed to be neither preemptible nor > revocable. Reserving resources therefore requires appropriate quota. > > · Aurora will tag reserved Mesos resources by adding the unique > instance key of the reserving task instance as a label. Only this task > instance will be allowed to use those tagged resources. > > I am unclear on the following general questions as there is contradicting > content: > > a) How does the user interact with reservations? There are several > proposals in the documents to auto-reserve on `aurora job create` or > `aurora cron schedule` and to automatically un-reserve on the appropriate > reverse actions. But will we also allow a user further control over the > reservations so that they can manage those independent of the task/job > lifecycle? For example, how does Borg handle this? > > b) The implementation proposal and patches include an > OfferReconciler, so this implies we don’t want to offer any control for the > user. The only control mechanism will be the cluster-wide offer wait time > limiting the number of seconds unused reserved resources can linger before > they are un-reserved. > > c) Will we allow adhoc/cron jobs to reserve resources? Does it even > matter if we don’t give control to users and just rely on the > OfferReconciler? > > > I have a couple of questions on the MVP and some implementation details. I > will follow up with those in a separate mail. > > Thanks and best regards, > Stephan >