Thanks for the info, Steve! Yes, it would accomplish the same goal but at the price of removing the exclusive dedicated constraint enforcement. With this patch any job could target a fully dedicated exclusive pool, which may be undesirable for dedicated pool owners.
On Wed, Jan 20, 2016 at 7:13 AM, Steve Niemitz <sniem...@apache.org> wrote: > We've been running a trivial patch [1] that does what I believe you're > talking about for awhile now. It allows a * for the role name, basically > allowing any role to match the constraint, so our constraints look like > "*/secure" > > Our use case is we have a "secure" cluster of machines that is constrained > on what can run on it (via an external audit process) that multiple roles > run on. > > I believe I had talked to Bill about this a few months ago, but I don't > remember where it ended up. > > [1] > https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562 > > On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko <ma...@apache.org> > wrote: > >> Oh, I didn't mean the memory GC pressure in the pure sense, rather a >> logical garbage of orphaned hosts that never leave the scheduler. It's >> not something to be concerned about from the performance standpoint. >> It's, however, something operators need to be aware of when a host >> from a dedicated pool gets dropped or replaced. >> >> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wfar...@apache.org> wrote: >> > What do you mean by GC burden? What i'm proposing is effectively >> > Map<String, String>. Even with an extremely forgetful operator (even >> more >> > than Joe!), it would require a huge oversight to put a dent in heap >> usage. >> > I'm sure there are ways we could even expose a useful stat to flag such >> an >> > oversight. >> > >> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <ma...@apache.org> >> wrote: >> > >> >> Right, that's what I thought. Yes, it sounds interesting. My only >> >> concern is the GC burden of getting rid of hostnames that are obsolete >> >> and no longer exist. Relying on offers to update hostname 'relevance' >> >> may not work as dedicated hosts may be fully packed and not release >> >> any resources for a very long time. Let me explore this idea a bit to >> >> see what it would take to implement. >> >> >> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <wfar...@apache.org> >> wrote: >> >> > Not a host->attribute mapping (attribute in the mesos sense, anyway). >> >> Rather >> >> > an out-of-band API for marking machines as reserved. For task->offer >> >> > mapping it's just a matter of another data source. Does that make >> sense? >> >> > >> >> > On Tuesday, January 19, 2016, Maxim Khutornenko <ma...@apache.org> >> >> wrote: >> >> > >> >> >> > >> >> >> > Can't this just be any old Constraint (not named "dedicated"). In >> >> other >> >> >> > words, doesn't this code already deal with non-dedicated >> constraints?: >> >> >> > >> >> >> > >> >> >> >> >> >> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197 >> >> >> >> >> >> >> >> >> Not really. There is a subtle difference here. A regular >> (non-dedicated) >> >> >> constraint does not prevent other tasks from landing on a given >> machine >> >> set >> >> >> whereas dedicated keeps other tasks away by only allowing those >> matching >> >> >> the dedicated attribute. What this proposal targets is allowing >> >> exclusive >> >> >> machine pool matching any job that has this new constraint while >> keeping >> >> >> all other tasks that don't have that attribute away. >> >> >> >> >> >> Following an example from my original post, imagine a GPU machine >> pool. >> >> Any >> >> >> job (from any role) requiring GPU resource would be allowed while all >> >> other >> >> >> jobs that don't have that constraint would be vetoed. >> >> >> >> >> >> Also, regarding dedicated constraints necessitating a slave restart - >> >> i've >> >> >> > pondered moving dedicated machine management to the scheduler for >> >> similar >> >> >> > purposes. There's not really much forcing that behavior to be >> managed >> >> >> with >> >> >> > a slave attribute. >> >> >> >> >> >> >> >> >> Would you mind giving a few more hints on the mechanics behind this? >> How >> >> >> would scheduler know about dedicated hw without the slave attributes >> >> set? >> >> >> Are you proposing storing hostname->attribute mapping in the >> scheduler >> >> >> store? >> >> >> >> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <wfar...@apache.org >> >> >> <javascript:;>> wrote: >> >> >> >> >> >> > Joe - if you want to pursue this, I suggest you start another >> thread >> >> to >> >> >> > keep this thread's discussion in tact. I will not be able to lead >> >> this >> >> >> > change, but can certainly shepherd! >> >> >> > >> >> >> > On Tuesday, January 19, 2016, Joe Smith <yasumo...@gmail.com >> >> >> <javascript:;>> wrote: >> >> >> > >> >> >> > > As an operator, that'd be a relatively simple change in tooling, >> and >> >> >> the >> >> >> > > benefits of not forcing a slave restart would be _huge_. >> >> >> > > >> >> >> > > Keeping the dedicated semantics (but adding non-exclusive) would >> be >> >> >> ideal >> >> >> > > if possible. >> >> >> > > >> >> >> > > > On Jan 19, 2016, at 19:09, Bill Farner <wfar...@apache.org >> >> >> <javascript:;> >> >> >> > > <javascript:;>> wrote: >> >> >> > > > >> >> >> > > > Also, regarding dedicated constraints necessitating a slave >> >> restart - >> >> >> > > i've >> >> >> > > > pondered moving dedicated machine management to the scheduler >> for >> >> >> > similar >> >> >> > > > purposes. There's not really much forcing that behavior to be >> >> >> managed >> >> >> > > with >> >> >> > > > a slave attribute. >> >> >> > > > >> >> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois < >> j...@conductant.com >> >> >> <javascript:;> >> >> >> > > <javascript:;>> wrote: >> >> >> > > > >> >> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko < >> >> >> ma...@apache.org <javascript:;> >> >> >> > > <javascript:;>> >> >> >> > > >> wrote: >> >> >> > > >> >> >> >> > > >>> Has anyone explored an idea of having a non-exclusive (wrt >> job >> >> >> role) >> >> >> > > >>> dedicated constraint in Aurora before? >> >> >> > > >> >> >> >> > > >> >> >> >> > > >>> We do have a dedicated constraint now but it assumes a 1:1 >> >> >> > > >>> relationship between a job role and a slave attribute [1]. >> For >> >> >> > > >>> example: a 'www-data/prod/hello' job with a dedicated >> >> constraint of >> >> >> > > >>> 'dedicated': 'www-data/hello' may only be pinned to a >> particular >> >> >> set >> >> >> > > >>> of slaves if all of them have 'www-data/hello' attribute >> set. No >> >> >> > other >> >> >> > > >>> role tasks will be able to land on those slaves unless their >> >> >> > > >>> 'role/name' pair is added into the slave attribute set. >> >> >> > > >>> >> >> >> > > >>> The above is very limiting as it prevents carving out subsets >> >> of a >> >> >> > > >>> shared pool cluster to be used by multiple roles at the same >> >> time. >> >> >> > > >>> Would it make sense to have a free-form dedicated constraint >> not >> >> >> > bound >> >> >> > > >>> to a particular role? Multiple jobs could then use this type >> of >> >> >> > > >>> constraint dynamically without modifying the slave command >> line >> >> >> (and >> >> >> > > >>> requiring slave restart). >> >> >> > > >> >> >> >> > > >> Can't this just be any old Constraint (not named "dedicated"). >> >> In >> >> >> > other >> >> >> > > >> words, doesn't this code already deal with non-dedicated >> >> >> constraints?: >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> > >> >> >> >> >> >> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197 >> >> >> > > >> >> >> >> > > >> >> >> >> > > >>> This could be quite useful for experimenting purposes (e.g. >> >> >> different >> >> >> > > >>> host OS) or to target a different hardware offering (e.g. >> >> GPUs). In >> >> >> > > >>> other words, only those jobs that explicitly opt-in to >> >> participate >> >> >> in >> >> >> > > >>> an experiment or hw offering would be landing on that slave >> set. >> >> >> > > >>> >> >> >> > > >>> Thanks, >> >> >> > > >>> Maxim >> >> >> > > >>> >> >> >> > > >>> [1]- >> >> >> > > >> >> >> >> > > >> >> >> > >> >> >> >> >> >> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276 >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> -- >> >> >> > > >> John Sirois >> >> >> > > >> 303-512-3301 >> >> >> > > >> >> >> >> > > >> >> >> > >> >> >> >> >> >>