An arbitrary job can't target a fully dedicated role with this patch, it will still get a "constraint not satisfied: dedicated" error. The code in the scheduler that matches the constraints does a simple string match, so "*/test" will not match "role1/test" when trying to place the task, it will only match "*/test".
On Wed, Jan 20, 2016 at 12:24 PM, Maxim Khutornenko <ma...@apache.org> wrote: > Thanks for the info, Steve! Yes, it would accomplish the same goal but > at the price of removing the exclusive dedicated constraint > enforcement. With this patch any job could target a fully dedicated > exclusive pool, which may be undesirable for dedicated pool owners. > > > > On Wed, Jan 20, 2016 at 7:13 AM, Steve Niemitz <sniem...@apache.org> > wrote: > > We've been running a trivial patch [1] that does what I believe you're > > talking about for awhile now. It allows a * for the role name, basically > > allowing any role to match the constraint, so our constraints look like > > "*/secure" > > > > Our use case is we have a "secure" cluster of machines that is > constrained > > on what can run on it (via an external audit process) that multiple roles > > run on. > > > > I believe I had talked to Bill about this a few months ago, but I don't > > remember where it ended up. > > > > [1] > > > https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562 > > > > On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko <ma...@apache.org> > > wrote: > > > >> Oh, I didn't mean the memory GC pressure in the pure sense, rather a > >> logical garbage of orphaned hosts that never leave the scheduler. It's > >> not something to be concerned about from the performance standpoint. > >> It's, however, something operators need to be aware of when a host > >> from a dedicated pool gets dropped or replaced. > >> > >> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wfar...@apache.org> > wrote: > >> > What do you mean by GC burden? What i'm proposing is effectively > >> > Map<String, String>. Even with an extremely forgetful operator (even > >> more > >> > than Joe!), it would require a huge oversight to put a dent in heap > >> usage. > >> > I'm sure there are ways we could even expose a useful stat to flag > such > >> an > >> > oversight. > >> > > >> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <ma...@apache.org> > >> wrote: > >> > > >> >> Right, that's what I thought. Yes, it sounds interesting. My only > >> >> concern is the GC burden of getting rid of hostnames that are > obsolete > >> >> and no longer exist. Relying on offers to update hostname 'relevance' > >> >> may not work as dedicated hosts may be fully packed and not release > >> >> any resources for a very long time. Let me explore this idea a bit to > >> >> see what it would take to implement. > >> >> > >> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <wfar...@apache.org> > >> wrote: > >> >> > Not a host->attribute mapping (attribute in the mesos sense, > anyway). > >> >> Rather > >> >> > an out-of-band API for marking machines as reserved. For > task->offer > >> >> > mapping it's just a matter of another data source. Does that make > >> sense? > >> >> > > >> >> > On Tuesday, January 19, 2016, Maxim Khutornenko <ma...@apache.org> > >> >> wrote: > >> >> > > >> >> >> > > >> >> >> > Can't this just be any old Constraint (not named "dedicated"). > In > >> >> other > >> >> >> > words, doesn't this code already deal with non-dedicated > >> constraints?: > >> >> >> > > >> >> >> > > >> >> >> > >> >> > >> > https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197 > >> >> >> > >> >> >> > >> >> >> Not really. There is a subtle difference here. A regular > >> (non-dedicated) > >> >> >> constraint does not prevent other tasks from landing on a given > >> machine > >> >> set > >> >> >> whereas dedicated keeps other tasks away by only allowing those > >> matching > >> >> >> the dedicated attribute. What this proposal targets is allowing > >> >> exclusive > >> >> >> machine pool matching any job that has this new constraint while > >> keeping > >> >> >> all other tasks that don't have that attribute away. > >> >> >> > >> >> >> Following an example from my original post, imagine a GPU machine > >> pool. > >> >> Any > >> >> >> job (from any role) requiring GPU resource would be allowed while > all > >> >> other > >> >> >> jobs that don't have that constraint would be vetoed. > >> >> >> > >> >> >> Also, regarding dedicated constraints necessitating a slave > restart - > >> >> i've > >> >> >> > pondered moving dedicated machine management to the scheduler > for > >> >> similar > >> >> >> > purposes. There's not really much forcing that behavior to be > >> managed > >> >> >> with > >> >> >> > a slave attribute. > >> >> >> > >> >> >> > >> >> >> Would you mind giving a few more hints on the mechanics behind > this? > >> How > >> >> >> would scheduler know about dedicated hw without the slave > attributes > >> >> set? > >> >> >> Are you proposing storing hostname->attribute mapping in the > >> scheduler > >> >> >> store? > >> >> >> > >> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <wfar...@apache.org > >> >> >> <javascript:;>> wrote: > >> >> >> > >> >> >> > Joe - if you want to pursue this, I suggest you start another > >> thread > >> >> to > >> >> >> > keep this thread's discussion in tact. I will not be able to > lead > >> >> this > >> >> >> > change, but can certainly shepherd! > >> >> >> > > >> >> >> > On Tuesday, January 19, 2016, Joe Smith <yasumo...@gmail.com > >> >> >> <javascript:;>> wrote: > >> >> >> > > >> >> >> > > As an operator, that'd be a relatively simple change in > tooling, > >> and > >> >> >> the > >> >> >> > > benefits of not forcing a slave restart would be _huge_. > >> >> >> > > > >> >> >> > > Keeping the dedicated semantics (but adding non-exclusive) > would > >> be > >> >> >> ideal > >> >> >> > > if possible. > >> >> >> > > > >> >> >> > > > On Jan 19, 2016, at 19:09, Bill Farner <wfar...@apache.org > >> >> >> <javascript:;> > >> >> >> > > <javascript:;>> wrote: > >> >> >> > > > > >> >> >> > > > Also, regarding dedicated constraints necessitating a slave > >> >> restart - > >> >> >> > > i've > >> >> >> > > > pondered moving dedicated machine management to the > scheduler > >> for > >> >> >> > similar > >> >> >> > > > purposes. There's not really much forcing that behavior to > be > >> >> >> managed > >> >> >> > > with > >> >> >> > > > a slave attribute. > >> >> >> > > > > >> >> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois < > >> j...@conductant.com > >> >> >> <javascript:;> > >> >> >> > > <javascript:;>> wrote: > >> >> >> > > > > >> >> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko < > >> >> >> ma...@apache.org <javascript:;> > >> >> >> > > <javascript:;>> > >> >> >> > > >> wrote: > >> >> >> > > >> > >> >> >> > > >>> Has anyone explored an idea of having a non-exclusive (wrt > >> job > >> >> >> role) > >> >> >> > > >>> dedicated constraint in Aurora before? > >> >> >> > > >> > >> >> >> > > >> > >> >> >> > > >>> We do have a dedicated constraint now but it assumes a 1:1 > >> >> >> > > >>> relationship between a job role and a slave attribute [1]. > >> For > >> >> >> > > >>> example: a 'www-data/prod/hello' job with a dedicated > >> >> constraint of > >> >> >> > > >>> 'dedicated': 'www-data/hello' may only be pinned to a > >> particular > >> >> >> set > >> >> >> > > >>> of slaves if all of them have 'www-data/hello' attribute > >> set. No > >> >> >> > other > >> >> >> > > >>> role tasks will be able to land on those slaves unless > their > >> >> >> > > >>> 'role/name' pair is added into the slave attribute set. > >> >> >> > > >>> > >> >> >> > > >>> The above is very limiting as it prevents carving out > subsets > >> >> of a > >> >> >> > > >>> shared pool cluster to be used by multiple roles at the > same > >> >> time. > >> >> >> > > >>> Would it make sense to have a free-form dedicated > constraint > >> not > >> >> >> > bound > >> >> >> > > >>> to a particular role? Multiple jobs could then use this > type > >> of > >> >> >> > > >>> constraint dynamically without modifying the slave command > >> line > >> >> >> (and > >> >> >> > > >>> requiring slave restart). > >> >> >> > > >> > >> >> >> > > >> Can't this just be any old Constraint (not named > "dedicated"). > >> >> In > >> >> >> > other > >> >> >> > > >> words, doesn't this code already deal with non-dedicated > >> >> >> constraints?: > >> >> >> > > >> > >> >> >> > > >> > >> >> >> > > > >> >> >> > > >> >> >> > >> >> > >> > https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197 > >> >> >> > > >> > >> >> >> > > >> > >> >> >> > > >>> This could be quite useful for experimenting purposes > (e.g. > >> >> >> different > >> >> >> > > >>> host OS) or to target a different hardware offering (e.g. > >> >> GPUs). In > >> >> >> > > >>> other words, only those jobs that explicitly opt-in to > >> >> participate > >> >> >> in > >> >> >> > > >>> an experiment or hw offering would be landing on that > slave > >> set. > >> >> >> > > >>> > >> >> >> > > >>> Thanks, > >> >> >> > > >>> Maxim > >> >> >> > > >>> > >> >> >> > > >>> [1]- > >> >> >> > > >> > >> >> >> > > > >> >> >> > > >> >> >> > >> >> > >> > https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276 > >> >> >> > > >> > >> >> >> > > >> > >> >> >> > > >> > >> >> >> > > >> -- > >> >> >> > > >> John Sirois > >> >> >> > > >> 303-512-3301 > >> >> >> > > >> > >> >> >> > > > >> >> >> > > >> >> >> > >> >> > >> >