I should have looked closely, you are right! This indeed addresses both cases: a job with a named dedicated role is still allowed to get though if it's role matches the constraint and everything else (non-exclusive dedicated pool) is addressed with "*".
What it does not solve though is the variety of non-exclusive dedicated pools (e.g. GPU, OS, high network bandwidth and etc.). For that we would need something similar to what Bill suggested. On Wed, Jan 20, 2016 at 10:03 AM, Steve Niemitz <sniem...@apache.org> wrote: > An arbitrary job can't target a fully dedicated role with this patch, it > will still get a "constraint not satisfied: dedicated" error. The code in > the scheduler that matches the constraints does a simple string match, so > "*/test" will not match "role1/test" when trying to place the task, it will > only match "*/test". > > On Wed, Jan 20, 2016 at 12:24 PM, Maxim Khutornenko <ma...@apache.org> > wrote: > >> Thanks for the info, Steve! Yes, it would accomplish the same goal but >> at the price of removing the exclusive dedicated constraint >> enforcement. With this patch any job could target a fully dedicated >> exclusive pool, which may be undesirable for dedicated pool owners. >> >> >> >> On Wed, Jan 20, 2016 at 7:13 AM, Steve Niemitz <sniem...@apache.org> >> wrote: >> > We've been running a trivial patch [1] that does what I believe you're >> > talking about for awhile now. It allows a * for the role name, basically >> > allowing any role to match the constraint, so our constraints look like >> > "*/secure" >> > >> > Our use case is we have a "secure" cluster of machines that is >> constrained >> > on what can run on it (via an external audit process) that multiple roles >> > run on. >> > >> > I believe I had talked to Bill about this a few months ago, but I don't >> > remember where it ended up. >> > >> > [1] >> > >> https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562 >> > >> > On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko <ma...@apache.org> >> > wrote: >> > >> >> Oh, I didn't mean the memory GC pressure in the pure sense, rather a >> >> logical garbage of orphaned hosts that never leave the scheduler. It's >> >> not something to be concerned about from the performance standpoint. >> >> It's, however, something operators need to be aware of when a host >> >> from a dedicated pool gets dropped or replaced. >> >> >> >> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wfar...@apache.org> >> wrote: >> >> > What do you mean by GC burden? What i'm proposing is effectively >> >> > Map<String, String>. Even with an extremely forgetful operator (even >> >> more >> >> > than Joe!), it would require a huge oversight to put a dent in heap >> >> usage. >> >> > I'm sure there are ways we could even expose a useful stat to flag >> such >> >> an >> >> > oversight. >> >> > >> >> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <ma...@apache.org> >> >> wrote: >> >> > >> >> >> Right, that's what I thought. Yes, it sounds interesting. My only >> >> >> concern is the GC burden of getting rid of hostnames that are >> obsolete >> >> >> and no longer exist. Relying on offers to update hostname 'relevance' >> >> >> may not work as dedicated hosts may be fully packed and not release >> >> >> any resources for a very long time. Let me explore this idea a bit to >> >> >> see what it would take to implement. >> >> >> >> >> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <wfar...@apache.org> >> >> wrote: >> >> >> > Not a host->attribute mapping (attribute in the mesos sense, >> anyway). >> >> >> Rather >> >> >> > an out-of-band API for marking machines as reserved. For >> task->offer >> >> >> > mapping it's just a matter of another data source. Does that make >> >> sense? >> >> >> > >> >> >> > On Tuesday, January 19, 2016, Maxim Khutornenko <ma...@apache.org> >> >> >> wrote: >> >> >> > >> >> >> >> > >> >> >> >> > Can't this just be any old Constraint (not named "dedicated"). >> In >> >> >> other >> >> >> >> > words, doesn't this code already deal with non-dedicated >> >> constraints?: >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197 >> >> >> >> >> >> >> >> >> >> >> >> Not really. There is a subtle difference here. A regular >> >> (non-dedicated) >> >> >> >> constraint does not prevent other tasks from landing on a given >> >> machine >> >> >> set >> >> >> >> whereas dedicated keeps other tasks away by only allowing those >> >> matching >> >> >> >> the dedicated attribute. What this proposal targets is allowing >> >> >> exclusive >> >> >> >> machine pool matching any job that has this new constraint while >> >> keeping >> >> >> >> all other tasks that don't have that attribute away. >> >> >> >> >> >> >> >> Following an example from my original post, imagine a GPU machine >> >> pool. >> >> >> Any >> >> >> >> job (from any role) requiring GPU resource would be allowed while >> all >> >> >> other >> >> >> >> jobs that don't have that constraint would be vetoed. >> >> >> >> >> >> >> >> Also, regarding dedicated constraints necessitating a slave >> restart - >> >> >> i've >> >> >> >> > pondered moving dedicated machine management to the scheduler >> for >> >> >> similar >> >> >> >> > purposes. There's not really much forcing that behavior to be >> >> managed >> >> >> >> with >> >> >> >> > a slave attribute. >> >> >> >> >> >> >> >> >> >> >> >> Would you mind giving a few more hints on the mechanics behind >> this? >> >> How >> >> >> >> would scheduler know about dedicated hw without the slave >> attributes >> >> >> set? >> >> >> >> Are you proposing storing hostname->attribute mapping in the >> >> scheduler >> >> >> >> store? >> >> >> >> >> >> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <wfar...@apache.org >> >> >> >> <javascript:;>> wrote: >> >> >> >> >> >> >> >> > Joe - if you want to pursue this, I suggest you start another >> >> thread >> >> >> to >> >> >> >> > keep this thread's discussion in tact. I will not be able to >> lead >> >> >> this >> >> >> >> > change, but can certainly shepherd! >> >> >> >> > >> >> >> >> > On Tuesday, January 19, 2016, Joe Smith <yasumo...@gmail.com >> >> >> >> <javascript:;>> wrote: >> >> >> >> > >> >> >> >> > > As an operator, that'd be a relatively simple change in >> tooling, >> >> and >> >> >> >> the >> >> >> >> > > benefits of not forcing a slave restart would be _huge_. >> >> >> >> > > >> >> >> >> > > Keeping the dedicated semantics (but adding non-exclusive) >> would >> >> be >> >> >> >> ideal >> >> >> >> > > if possible. >> >> >> >> > > >> >> >> >> > > > On Jan 19, 2016, at 19:09, Bill Farner <wfar...@apache.org >> >> >> >> <javascript:;> >> >> >> >> > > <javascript:;>> wrote: >> >> >> >> > > > >> >> >> >> > > > Also, regarding dedicated constraints necessitating a slave >> >> >> restart - >> >> >> >> > > i've >> >> >> >> > > > pondered moving dedicated machine management to the >> scheduler >> >> for >> >> >> >> > similar >> >> >> >> > > > purposes. There's not really much forcing that behavior to >> be >> >> >> >> managed >> >> >> >> > > with >> >> >> >> > > > a slave attribute. >> >> >> >> > > > >> >> >> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois < >> >> j...@conductant.com >> >> >> >> <javascript:;> >> >> >> >> > > <javascript:;>> wrote: >> >> >> >> > > > >> >> >> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko < >> >> >> >> ma...@apache.org <javascript:;> >> >> >> >> > > <javascript:;>> >> >> >> >> > > >> wrote: >> >> >> >> > > >> >> >> >> >> > > >>> Has anyone explored an idea of having a non-exclusive (wrt >> >> job >> >> >> >> role) >> >> >> >> > > >>> dedicated constraint in Aurora before? >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> > > >>> We do have a dedicated constraint now but it assumes a 1:1 >> >> >> >> > > >>> relationship between a job role and a slave attribute [1]. >> >> For >> >> >> >> > > >>> example: a 'www-data/prod/hello' job with a dedicated >> >> >> constraint of >> >> >> >> > > >>> 'dedicated': 'www-data/hello' may only be pinned to a >> >> particular >> >> >> >> set >> >> >> >> > > >>> of slaves if all of them have 'www-data/hello' attribute >> >> set. No >> >> >> >> > other >> >> >> >> > > >>> role tasks will be able to land on those slaves unless >> their >> >> >> >> > > >>> 'role/name' pair is added into the slave attribute set. >> >> >> >> > > >>> >> >> >> >> > > >>> The above is very limiting as it prevents carving out >> subsets >> >> >> of a >> >> >> >> > > >>> shared pool cluster to be used by multiple roles at the >> same >> >> >> time. >> >> >> >> > > >>> Would it make sense to have a free-form dedicated >> constraint >> >> not >> >> >> >> > bound >> >> >> >> > > >>> to a particular role? Multiple jobs could then use this >> type >> >> of >> >> >> >> > > >>> constraint dynamically without modifying the slave command >> >> line >> >> >> >> (and >> >> >> >> > > >>> requiring slave restart). >> >> >> >> > > >> >> >> >> >> > > >> Can't this just be any old Constraint (not named >> "dedicated"). >> >> >> In >> >> >> >> > other >> >> >> >> > > >> words, doesn't this code already deal with non-dedicated >> >> >> >> constraints?: >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197 >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> > > >>> This could be quite useful for experimenting purposes >> (e.g. >> >> >> >> different >> >> >> >> > > >>> host OS) or to target a different hardware offering (e.g. >> >> >> GPUs). In >> >> >> >> > > >>> other words, only those jobs that explicitly opt-in to >> >> >> participate >> >> >> >> in >> >> >> >> > > >>> an experiment or hw offering would be landing on that >> slave >> >> set. >> >> >> >> > > >>> >> >> >> >> > > >>> Thanks, >> >> >> >> > > >>> Maxim >> >> >> >> > > >>> >> >> >> >> > > >>> [1]- >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276 >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> > > >> -- >> >> >> >> > > >> John Sirois >> >> >> >> > > >> 303-512-3301 >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> > >> >> >> >> >> >> >> >> >> >>