Hi everyone,

There have been two documents on Dynamic Reservations as a first step towards 
persistent services:

·         RFC: 
https://docs.google.com/document/d/15n29HSQPXuFrnxZAgfVINTRP1Iv47_jfcstJNuMwr5A/edit#heading=h.hcsc8tda08vy

·         Technical Design Doc:  
https://docs.google.com/document/d/1L2EKEcKKBPmuxRviSUebyuqiNwaO-2hsITBjt3SgWvE/edit#heading=h.klg3urfbnq3v

Since a couple of days there are also now two patches online for a MVP by 
Dmitriy:

·         https://reviews.apache.org/r/56690/

·         https://reviews.apache.org/r/56691/

From reading the documents, I am under the impression that there is a rough 
consensus on the following points:

·         We want dynamic reservations. Our general goal is to enable the 
re-scheduling of tasks on the same host they used in a previous run.

·         Dynamic reservations are a best-effort feature. If in doubt, a task 
will be scheduled somewhere else.

·         Jobs opt into reserved resources using an appropriate tier config.

·         The tier config in supposed to be neither preemptible nor revocable. 
Reserving resources therefore requires appropriate quota.

·         Aurora will tag reserved Mesos resources by adding the unique 
instance key of the reserving task instance as a label. Only this task instance 
will be allowed to use those tagged resources.

I am unclear on the following general questions as there is contradicting 
content:

a)       How does the user interact with reservations?  There are several 
proposals in the documents to auto-reserve on `aurora job create` or `aurora 
cron schedule` and to automatically un-reserve on the appropriate reverse 
actions. But will we also allow a user further control over the reservations so 
that they can manage those independent of the task/job lifecycle? For example, 
how does Borg handle this?

b)       The implementation proposal and patches include an OfferReconciler, so 
this implies we don’t want to offer any control for the user. The only control 
mechanism will be the cluster-wide offer wait time limiting the number of 
seconds unused reserved resources can linger before they are un-reserved.

c)       Will we allow adhoc/cron jobs to reserve resources? Does it even 
matter if we don’t give control to users and just rely on the OfferReconciler?


I have a couple of questions on the MVP and some implementation details. I will 
follow up with those in a separate mail.

Thanks and best regards,
Stephan

Reply via email to