Reactivating this thread. I like Bill's suggestion to have scheduler dedicated constraint management system. It will, however, require a substantial effort to get done properly. Would anyone oppose adopting Steve's patch in the meantime? The ROI is so high it would be a crime NOT to take it :)
On Wed, Jan 20, 2016 at 10:25 AM, Maxim Khutornenko <ma...@apache.org> wrote: > I should have looked closely, you are right! This indeed addresses > both cases: a job with a named dedicated role is still allowed to get > though if it's role matches the constraint and everything else > (non-exclusive dedicated pool) is addressed with "*". > > What it does not solve though is the variety of non-exclusive > dedicated pools (e.g. GPU, OS, high network bandwidth and etc.). For > that we would need something similar to what Bill suggested. > > On Wed, Jan 20, 2016 at 10:03 AM, Steve Niemitz <sniem...@apache.org> wrote: >> An arbitrary job can't target a fully dedicated role with this patch, it >> will still get a "constraint not satisfied: dedicated" error. The code in >> the scheduler that matches the constraints does a simple string match, so >> "*/test" will not match "role1/test" when trying to place the task, it will >> only match "*/test". >> >> On Wed, Jan 20, 2016 at 12:24 PM, Maxim Khutornenko <ma...@apache.org> >> wrote: >> >>> Thanks for the info, Steve! Yes, it would accomplish the same goal but >>> at the price of removing the exclusive dedicated constraint >>> enforcement. With this patch any job could target a fully dedicated >>> exclusive pool, which may be undesirable for dedicated pool owners. >>> >>> >>> >>> On Wed, Jan 20, 2016 at 7:13 AM, Steve Niemitz <sniem...@apache.org> >>> wrote: >>> > We've been running a trivial patch [1] that does what I believe you're >>> > talking about for awhile now. It allows a * for the role name, basically >>> > allowing any role to match the constraint, so our constraints look like >>> > "*/secure" >>> > >>> > Our use case is we have a "secure" cluster of machines that is >>> constrained >>> > on what can run on it (via an external audit process) that multiple roles >>> > run on. >>> > >>> > I believe I had talked to Bill about this a few months ago, but I don't >>> > remember where it ended up. >>> > >>> > [1] >>> > >>> https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562 >>> > >>> > On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko <ma...@apache.org> >>> > wrote: >>> > >>> >> Oh, I didn't mean the memory GC pressure in the pure sense, rather a >>> >> logical garbage of orphaned hosts that never leave the scheduler. It's >>> >> not something to be concerned about from the performance standpoint. >>> >> It's, however, something operators need to be aware of when a host >>> >> from a dedicated pool gets dropped or replaced. >>> >> >>> >> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wfar...@apache.org> >>> wrote: >>> >> > What do you mean by GC burden? What i'm proposing is effectively >>> >> > Map<String, String>. Even with an extremely forgetful operator (even >>> >> more >>> >> > than Joe!), it would require a huge oversight to put a dent in heap >>> >> usage. >>> >> > I'm sure there are ways we could even expose a useful stat to flag >>> such >>> >> an >>> >> > oversight. >>> >> > >>> >> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <ma...@apache.org> >>> >> wrote: >>> >> > >>> >> >> Right, that's what I thought. Yes, it sounds interesting. My only >>> >> >> concern is the GC burden of getting rid of hostnames that are >>> obsolete >>> >> >> and no longer exist. Relying on offers to update hostname 'relevance' >>> >> >> may not work as dedicated hosts may be fully packed and not release >>> >> >> any resources for a very long time. Let me explore this idea a bit to >>> >> >> see what it would take to implement. >>> >> >> >>> >> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <wfar...@apache.org> >>> >> wrote: >>> >> >> > Not a host->attribute mapping (attribute in the mesos sense, >>> anyway). >>> >> >> Rather >>> >> >> > an out-of-band API for marking machines as reserved. For >>> task->offer >>> >> >> > mapping it's just a matter of another data source. Does that make >>> >> sense? >>> >> >> > >>> >> >> > On Tuesday, January 19, 2016, Maxim Khutornenko <ma...@apache.org> >>> >> >> wrote: >>> >> >> > >>> >> >> >> > >>> >> >> >> > Can't this just be any old Constraint (not named "dedicated"). >>> In >>> >> >> other >>> >> >> >> > words, doesn't this code already deal with non-dedicated >>> >> constraints?: >>> >> >> >> > >>> >> >> >> > >>> >> >> >> >>> >> >> >>> >> >>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197 >>> >> >> >> >>> >> >> >> >>> >> >> >> Not really. There is a subtle difference here. A regular >>> >> (non-dedicated) >>> >> >> >> constraint does not prevent other tasks from landing on a given >>> >> machine >>> >> >> set >>> >> >> >> whereas dedicated keeps other tasks away by only allowing those >>> >> matching >>> >> >> >> the dedicated attribute. What this proposal targets is allowing >>> >> >> exclusive >>> >> >> >> machine pool matching any job that has this new constraint while >>> >> keeping >>> >> >> >> all other tasks that don't have that attribute away. >>> >> >> >> >>> >> >> >> Following an example from my original post, imagine a GPU machine >>> >> pool. >>> >> >> Any >>> >> >> >> job (from any role) requiring GPU resource would be allowed while >>> all >>> >> >> other >>> >> >> >> jobs that don't have that constraint would be vetoed. >>> >> >> >> >>> >> >> >> Also, regarding dedicated constraints necessitating a slave >>> restart - >>> >> >> i've >>> >> >> >> > pondered moving dedicated machine management to the scheduler >>> for >>> >> >> similar >>> >> >> >> > purposes. There's not really much forcing that behavior to be >>> >> managed >>> >> >> >> with >>> >> >> >> > a slave attribute. >>> >> >> >> >>> >> >> >> >>> >> >> >> Would you mind giving a few more hints on the mechanics behind >>> this? >>> >> How >>> >> >> >> would scheduler know about dedicated hw without the slave >>> attributes >>> >> >> set? >>> >> >> >> Are you proposing storing hostname->attribute mapping in the >>> >> scheduler >>> >> >> >> store? >>> >> >> >> >>> >> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <wfar...@apache.org >>> >> >> >> <javascript:;>> wrote: >>> >> >> >> >>> >> >> >> > Joe - if you want to pursue this, I suggest you start another >>> >> thread >>> >> >> to >>> >> >> >> > keep this thread's discussion in tact. I will not be able to >>> lead >>> >> >> this >>> >> >> >> > change, but can certainly shepherd! >>> >> >> >> > >>> >> >> >> > On Tuesday, January 19, 2016, Joe Smith <yasumo...@gmail.com >>> >> >> >> <javascript:;>> wrote: >>> >> >> >> > >>> >> >> >> > > As an operator, that'd be a relatively simple change in >>> tooling, >>> >> and >>> >> >> >> the >>> >> >> >> > > benefits of not forcing a slave restart would be _huge_. >>> >> >> >> > > >>> >> >> >> > > Keeping the dedicated semantics (but adding non-exclusive) >>> would >>> >> be >>> >> >> >> ideal >>> >> >> >> > > if possible. >>> >> >> >> > > >>> >> >> >> > > > On Jan 19, 2016, at 19:09, Bill Farner <wfar...@apache.org >>> >> >> >> <javascript:;> >>> >> >> >> > > <javascript:;>> wrote: >>> >> >> >> > > > >>> >> >> >> > > > Also, regarding dedicated constraints necessitating a slave >>> >> >> restart - >>> >> >> >> > > i've >>> >> >> >> > > > pondered moving dedicated machine management to the >>> scheduler >>> >> for >>> >> >> >> > similar >>> >> >> >> > > > purposes. There's not really much forcing that behavior to >>> be >>> >> >> >> managed >>> >> >> >> > > with >>> >> >> >> > > > a slave attribute. >>> >> >> >> > > > >>> >> >> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois < >>> >> j...@conductant.com >>> >> >> >> <javascript:;> >>> >> >> >> > > <javascript:;>> wrote: >>> >> >> >> > > > >>> >> >> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko < >>> >> >> >> ma...@apache.org <javascript:;> >>> >> >> >> > > <javascript:;>> >>> >> >> >> > > >> wrote: >>> >> >> >> > > >> >>> >> >> >> > > >>> Has anyone explored an idea of having a non-exclusive (wrt >>> >> job >>> >> >> >> role) >>> >> >> >> > > >>> dedicated constraint in Aurora before? >>> >> >> >> > > >> >>> >> >> >> > > >> >>> >> >> >> > > >>> We do have a dedicated constraint now but it assumes a 1:1 >>> >> >> >> > > >>> relationship between a job role and a slave attribute [1]. >>> >> For >>> >> >> >> > > >>> example: a 'www-data/prod/hello' job with a dedicated >>> >> >> constraint of >>> >> >> >> > > >>> 'dedicated': 'www-data/hello' may only be pinned to a >>> >> particular >>> >> >> >> set >>> >> >> >> > > >>> of slaves if all of them have 'www-data/hello' attribute >>> >> set. No >>> >> >> >> > other >>> >> >> >> > > >>> role tasks will be able to land on those slaves unless >>> their >>> >> >> >> > > >>> 'role/name' pair is added into the slave attribute set. >>> >> >> >> > > >>> >>> >> >> >> > > >>> The above is very limiting as it prevents carving out >>> subsets >>> >> >> of a >>> >> >> >> > > >>> shared pool cluster to be used by multiple roles at the >>> same >>> >> >> time. >>> >> >> >> > > >>> Would it make sense to have a free-form dedicated >>> constraint >>> >> not >>> >> >> >> > bound >>> >> >> >> > > >>> to a particular role? Multiple jobs could then use this >>> type >>> >> of >>> >> >> >> > > >>> constraint dynamically without modifying the slave command >>> >> line >>> >> >> >> (and >>> >> >> >> > > >>> requiring slave restart). >>> >> >> >> > > >> >>> >> >> >> > > >> Can't this just be any old Constraint (not named >>> "dedicated"). >>> >> >> In >>> >> >> >> > other >>> >> >> >> > > >> words, doesn't this code already deal with non-dedicated >>> >> >> >> constraints?: >>> >> >> >> > > >> >>> >> >> >> > > >> >>> >> >> >> > > >>> >> >> >> > >>> >> >> >> >>> >> >> >>> >> >>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197 >>> >> >> >> > > >> >>> >> >> >> > > >> >>> >> >> >> > > >>> This could be quite useful for experimenting purposes >>> (e.g. >>> >> >> >> different >>> >> >> >> > > >>> host OS) or to target a different hardware offering (e.g. >>> >> >> GPUs). In >>> >> >> >> > > >>> other words, only those jobs that explicitly opt-in to >>> >> >> participate >>> >> >> >> in >>> >> >> >> > > >>> an experiment or hw offering would be landing on that >>> slave >>> >> set. >>> >> >> >> > > >>> >>> >> >> >> > > >>> Thanks, >>> >> >> >> > > >>> Maxim >>> >> >> >> > > >>> >>> >> >> >> > > >>> [1]- >>> >> >> >> > > >> >>> >> >> >> > > >>> >> >> >> > >>> >> >> >> >>> >> >> >>> >> >>> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276 >>> >> >> >> > > >> >>> >> >> >> > > >> >>> >> >> >> > > >> >>> >> >> >> > > >> -- >>> >> >> >> > > >> John Sirois >>> >> >> >> > > >> 303-512-3301 >>> >> >> >> > > >> >>> >> >> >> > > >>> >> >> >> > >>> >> >> >> >>> >> >> >>> >> >>>