David, I have two concerns with that idea. First, it would require persisting the relationship of <Hostname, Resources> to <Task> for every task. I'm not sure if adding more storage and storage operations is the ideal way of solving this problem. Second, in a multi framework environment, a framework needs to use dynamic reservations otherwise the resources might be taken by another framework.
On Wed, Mar 8, 2017 at 5:01 PM, David McLaughlin <dmclaugh...@apache.org> wrote: > So I read the docs again and I have one major question - do we even need > dynamic reservations for the current proposal? > > The current goal of the proposed work is to keep an offer on a host and > prevent some other pending task from taking it before the next scheduling > round. This exact problem is solved in preemption and we could use a > similar technique for reserving offers after killing tasks when going > through the update loop. We wouldn't need to add tiers or reconciliation or > solve any of these other concerns. Reusing an offer skips so much of the > expensive stuff in the Scheduler that it would be a no-brainer for the > operator to turn it on for every single task in the cluster. > > > On Thu, Mar 2, 2017 at 7:52 AM, Steve Niemitz <sniem...@apache.org> wrote: > > > I read over the docs, it looks like a good start. Personally I don't see > > much of a benefit for dynamically reserved cpu/mem, but I'm excited about > > the possibility of building off this for dynamically reserved persistent > > volumes. > > > > I would like to see more detail on how a reservation "times out", and the > > configuration options per job around that, as I feel like its the most > > complicated part of all of this. Ideally there would also be hooks into > > the host maintenance APIs here. > > > > I also didn't see any mention of it, but I believe mesos requires the > > framework to reserve resources with a role. By default aurora runs as > the > > special "*" role, does this mean aurora will need to have a role > specified > > now for this to work? Or does mesos allow reserving resources without a > > role? > > > > On Thu, Mar 2, 2017 at 8:35 AM, Erb, Stephan < > stephan....@blue-yonder.com> > > wrote: > > > > > Hi everyone, > > > > > > There have been two documents on Dynamic Reservations as a first step > > > towards persistent services: > > > > > > · RFC: https://docs.google.com/document/d/ > > > 15n29HSQPXuFrnxZAgfVINTRP1Iv47_jfcstJNuMwr5A/edit#heading=h. > hcsc8tda08vy > > > > > > · Technical Design Doc: https://docs.google.com/document/d/ > > > 1L2EKEcKKBPmuxRviSUebyuqiNwaO-2hsITBjt3SgWvE/edit#heading=h. > klg3urfbnq3v > > > > > > Since a couple of days there are also now two patches online for a MVP > by > > > Dmitriy: > > > > > > · https://reviews.apache.org/r/56690/ > > > > > > · https://reviews.apache.org/r/56691/ > > > > > > From reading the documents, I am under the impression that there is a > > > rough consensus on the following points: > > > > > > · We want dynamic reservations. Our general goal is to enable > the > > > re-scheduling of tasks on the same host they used in a previous run. > > > > > > · Dynamic reservations are a best-effort feature. If in doubt, > a > > > task will be scheduled somewhere else. > > > > > > · Jobs opt into reserved resources using an appropriate tier > > > config. > > > > > > · The tier config in supposed to be neither preemptible nor > > > revocable. Reserving resources therefore requires appropriate quota. > > > > > > · Aurora will tag reserved Mesos resources by adding the unique > > > instance key of the reserving task instance as a label. Only this task > > > instance will be allowed to use those tagged resources. > > > > > > I am unclear on the following general questions as there is > contradicting > > > content: > > > > > > a) How does the user interact with reservations? There are > several > > > proposals in the documents to auto-reserve on `aurora job create` or > > > `aurora cron schedule` and to automatically un-reserve on the > appropriate > > > reverse actions. But will we also allow a user further control over the > > > reservations so that they can manage those independent of the task/job > > > lifecycle? For example, how does Borg handle this? > > > > > > b) The implementation proposal and patches include an > > > OfferReconciler, so this implies we don’t want to offer any control for > > the > > > user. The only control mechanism will be the cluster-wide offer wait > time > > > limiting the number of seconds unused reserved resources can linger > > before > > > they are un-reserved. > > > > > > c) Will we allow adhoc/cron jobs to reserve resources? Does it > even > > > matter if we don’t give control to users and just rely on the > > > OfferReconciler? > > > > > > > > > I have a couple of questions on the MVP and some implementation > details. > > I > > > will follow up with those in a separate mail. > > > > > > Thanks and best regards, > > > Stephan > > > > > > > -- > Zameer Manji >