Hi everyone, There have been two documents on Dynamic Reservations as a first step towards persistent services:
· RFC: https://docs.google.com/document/d/15n29HSQPXuFrnxZAgfVINTRP1Iv47_jfcstJNuMwr5A/edit#heading=h.hcsc8tda08vy · Technical Design Doc: https://docs.google.com/document/d/1L2EKEcKKBPmuxRviSUebyuqiNwaO-2hsITBjt3SgWvE/edit#heading=h.klg3urfbnq3v Since a couple of days there are also now two patches online for a MVP by Dmitriy: · https://reviews.apache.org/r/56690/ · https://reviews.apache.org/r/56691/ From reading the documents, I am under the impression that there is a rough consensus on the following points: · We want dynamic reservations. Our general goal is to enable the re-scheduling of tasks on the same host they used in a previous run. · Dynamic reservations are a best-effort feature. If in doubt, a task will be scheduled somewhere else. · Jobs opt into reserved resources using an appropriate tier config. · The tier config in supposed to be neither preemptible nor revocable. Reserving resources therefore requires appropriate quota. · Aurora will tag reserved Mesos resources by adding the unique instance key of the reserving task instance as a label. Only this task instance will be allowed to use those tagged resources. I am unclear on the following general questions as there is contradicting content: a) How does the user interact with reservations? There are several proposals in the documents to auto-reserve on `aurora job create` or `aurora cron schedule` and to automatically un-reserve on the appropriate reverse actions. But will we also allow a user further control over the reservations so that they can manage those independent of the task/job lifecycle? For example, how does Borg handle this? b) The implementation proposal and patches include an OfferReconciler, so this implies we don’t want to offer any control for the user. The only control mechanism will be the cluster-wide offer wait time limiting the number of seconds unused reserved resources can linger before they are un-reserved. c) Will we allow adhoc/cron jobs to reserve resources? Does it even matter if we don’t give control to users and just rely on the OfferReconciler? I have a couple of questions on the MVP and some implementation details. I will follow up with those in a separate mail. Thanks and best regards, Stephan