Here is RB: https://reviews.apache.org/r/44602/
On Wed, Mar 9, 2016 at 2:41 PM, Bill Farner <wfar...@apache.org> wrote: > Ah, so it only practically makes sense when the dedicated attribute is > */something, but * would not make much sense. Seems reasonable to me. > > On Wed, Mar 9, 2016 at 2:32 PM, Maxim Khutornenko <ma...@apache.org> wrote: > >> It's an *easy* way to get a virtual cluster with specific >> requirements. One example: have a set of machines in a shared pool >> with a different OS. This would let any existing or new customers try >> their services for compliance. The alternative would be spinning off a >> completely new physical cluster, which is a huge overhead on both >> supply and demand sides. >> >> On Wed, Mar 9, 2016 at 2:26 PM, Bill Farner <wfar...@apache.org> wrote: >> > What does it mean to have a 'dedicated' host that's free-for-all like >> that? >> > >> > On Wed, Mar 9, 2016 at 2:16 PM, Maxim Khutornenko <ma...@apache.org> >> wrote: >> > >> >> Reactivating this thread. I like Bill's suggestion to have scheduler >> >> dedicated constraint management system. It will, however, require a >> >> substantial effort to get done properly. Would anyone oppose adopting >> >> Steve's patch in the meantime? The ROI is so high it would be a crime >> >> NOT to take it :) >> >> >> >> On Wed, Jan 20, 2016 at 10:25 AM, Maxim Khutornenko <ma...@apache.org> >> >> wrote: >> >> > I should have looked closely, you are right! This indeed addresses >> >> > both cases: a job with a named dedicated role is still allowed to get >> >> > though if it's role matches the constraint and everything else >> >> > (non-exclusive dedicated pool) is addressed with "*". >> >> > >> >> > What it does not solve though is the variety of non-exclusive >> >> > dedicated pools (e.g. GPU, OS, high network bandwidth and etc.). For >> >> > that we would need something similar to what Bill suggested. >> >> > >> >> > On Wed, Jan 20, 2016 at 10:03 AM, Steve Niemitz <sniem...@apache.org> >> >> wrote: >> >> >> An arbitrary job can't target a fully dedicated role with this >> patch, it >> >> >> will still get a "constraint not satisfied: dedicated" error. The >> code >> >> in >> >> >> the scheduler that matches the constraints does a simple string >> match, >> >> so >> >> >> "*/test" will not match "role1/test" when trying to place the task, >> it >> >> will >> >> >> only match "*/test". >> >> >> >> >> >> On Wed, Jan 20, 2016 at 12:24 PM, Maxim Khutornenko < >> ma...@apache.org> >> >> >> wrote: >> >> >> >> >> >>> Thanks for the info, Steve! Yes, it would accomplish the same goal >> but >> >> >>> at the price of removing the exclusive dedicated constraint >> >> >>> enforcement. With this patch any job could target a fully dedicated >> >> >>> exclusive pool, which may be undesirable for dedicated pool owners. >> >> >>> >> >> >>> >> >> >>> >> >> >>> On Wed, Jan 20, 2016 at 7:13 AM, Steve Niemitz <sniem...@apache.org >> > >> >> >>> wrote: >> >> >>> > We've been running a trivial patch [1] that does what I believe >> >> you're >> >> >>> > talking about for awhile now. It allows a * for the role name, >> >> basically >> >> >>> > allowing any role to match the constraint, so our constraints look >> >> like >> >> >>> > "*/secure" >> >> >>> > >> >> >>> > Our use case is we have a "secure" cluster of machines that is >> >> >>> constrained >> >> >>> > on what can run on it (via an external audit process) that >> multiple >> >> roles >> >> >>> > run on. >> >> >>> > >> >> >>> > I believe I had talked to Bill about this a few months ago, but I >> >> don't >> >> >>> > remember where it ended up. >> >> >>> > >> >> >>> > [1] >> >> >>> > >> >> >>> >> >> >> https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562 >> >> >>> > >> >> >>> > On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko < >> >> ma...@apache.org> >> >> >>> > wrote: >> >> >>> > >> >> >>> >> Oh, I didn't mean the memory GC pressure in the pure sense, >> rather a >> >> >>> >> logical garbage of orphaned hosts that never leave the scheduler. >> >> It's >> >> >>> >> not something to be concerned about from the performance >> standpoint. >> >> >>> >> It's, however, something operators need to be aware of when a >> host >> >> >>> >> from a dedicated pool gets dropped or replaced. >> >> >>> >> >> >> >>> >> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wfar...@apache.org >> > >> >> >>> wrote: >> >> >>> >> > What do you mean by GC burden? What i'm proposing is >> effectively >> >> >>> >> > Map<String, String>. Even with an extremely forgetful operator >> >> (even >> >> >>> >> more >> >> >>> >> > than Joe!), it would require a huge oversight to put a dent in >> >> heap >> >> >>> >> usage. >> >> >>> >> > I'm sure there are ways we could even expose a useful stat to >> flag >> >> >>> such >> >> >>> >> an >> >> >>> >> > oversight. >> >> >>> >> > >> >> >>> >> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko < >> >> ma...@apache.org> >> >> >>> >> wrote: >> >> >>> >> > >> >> >>> >> >> Right, that's what I thought. Yes, it sounds interesting. My >> only >> >> >>> >> >> concern is the GC burden of getting rid of hostnames that are >> >> >>> obsolete >> >> >>> >> >> and no longer exist. Relying on offers to update hostname >> >> 'relevance' >> >> >>> >> >> may not work as dedicated hosts may be fully packed and not >> >> release >> >> >>> >> >> any resources for a very long time. Let me explore this idea a >> >> bit to >> >> >>> >> >> see what it would take to implement. >> >> >>> >> >> >> >> >>> >> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner < >> wfar...@apache.org >> >> > >> >> >>> >> wrote: >> >> >>> >> >> > Not a host->attribute mapping (attribute in the mesos sense, >> >> >>> anyway). >> >> >>> >> >> Rather >> >> >>> >> >> > an out-of-band API for marking machines as reserved. For >> >> >>> task->offer >> >> >>> >> >> > mapping it's just a matter of another data source. Does >> that >> >> make >> >> >>> >> sense? >> >> >>> >> >> > >> >> >>> >> >> > On Tuesday, January 19, 2016, Maxim Khutornenko < >> >> ma...@apache.org> >> >> >>> >> >> wrote: >> >> >>> >> >> > >> >> >>> >> >> >> > >> >> >>> >> >> >> > Can't this just be any old Constraint (not named >> >> "dedicated"). >> >> >>> In >> >> >>> >> >> other >> >> >>> >> >> >> > words, doesn't this code already deal with non-dedicated >> >> >>> >> constraints?: >> >> >>> >> >> >> > >> >> >>> >> >> >> > >> >> >>> >> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >>> >> >> >> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197 >> >> >>> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> Not really. There is a subtle difference here. A regular >> >> >>> >> (non-dedicated) >> >> >>> >> >> >> constraint does not prevent other tasks from landing on a >> >> given >> >> >>> >> machine >> >> >>> >> >> set >> >> >>> >> >> >> whereas dedicated keeps other tasks away by only allowing >> >> those >> >> >>> >> matching >> >> >>> >> >> >> the dedicated attribute. What this proposal targets is >> >> allowing >> >> >>> >> >> exclusive >> >> >>> >> >> >> machine pool matching any job that has this new constraint >> >> while >> >> >>> >> keeping >> >> >>> >> >> >> all other tasks that don't have that attribute away. >> >> >>> >> >> >> >> >> >>> >> >> >> Following an example from my original post, imagine a GPU >> >> machine >> >> >>> >> pool. >> >> >>> >> >> Any >> >> >>> >> >> >> job (from any role) requiring GPU resource would be allowed >> >> while >> >> >>> all >> >> >>> >> >> other >> >> >>> >> >> >> jobs that don't have that constraint would be vetoed. >> >> >>> >> >> >> >> >> >>> >> >> >> Also, regarding dedicated constraints necessitating a slave >> >> >>> restart - >> >> >>> >> >> i've >> >> >>> >> >> >> > pondered moving dedicated machine management to the >> >> scheduler >> >> >>> for >> >> >>> >> >> similar >> >> >>> >> >> >> > purposes. There's not really much forcing that behavior >> to >> >> be >> >> >>> >> managed >> >> >>> >> >> >> with >> >> >>> >> >> >> > a slave attribute. >> >> >>> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> Would you mind giving a few more hints on the mechanics >> behind >> >> >>> this? >> >> >>> >> How >> >> >>> >> >> >> would scheduler know about dedicated hw without the slave >> >> >>> attributes >> >> >>> >> >> set? >> >> >>> >> >> >> Are you proposing storing hostname->attribute mapping in >> the >> >> >>> >> scheduler >> >> >>> >> >> >> store? >> >> >>> >> >> >> >> >> >>> >> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner < >> >> wfar...@apache.org >> >> >>> >> >> >> <javascript:;>> wrote: >> >> >>> >> >> >> >> >> >>> >> >> >> > Joe - if you want to pursue this, I suggest you start >> >> another >> >> >>> >> thread >> >> >>> >> >> to >> >> >>> >> >> >> > keep this thread's discussion in tact. I will not be >> able >> >> to >> >> >>> lead >> >> >>> >> >> this >> >> >>> >> >> >> > change, but can certainly shepherd! >> >> >>> >> >> >> > >> >> >>> >> >> >> > On Tuesday, January 19, 2016, Joe Smith < >> >> yasumo...@gmail.com >> >> >>> >> >> >> <javascript:;>> wrote: >> >> >>> >> >> >> > >> >> >>> >> >> >> > > As an operator, that'd be a relatively simple change in >> >> >>> tooling, >> >> >>> >> and >> >> >>> >> >> >> the >> >> >>> >> >> >> > > benefits of not forcing a slave restart would be >> _huge_. >> >> >>> >> >> >> > > >> >> >>> >> >> >> > > Keeping the dedicated semantics (but adding >> non-exclusive) >> >> >>> would >> >> >>> >> be >> >> >>> >> >> >> ideal >> >> >>> >> >> >> > > if possible. >> >> >>> >> >> >> > > >> >> >>> >> >> >> > > > On Jan 19, 2016, at 19:09, Bill Farner < >> >> wfar...@apache.org >> >> >>> >> >> >> <javascript:;> >> >> >>> >> >> >> > > <javascript:;>> wrote: >> >> >>> >> >> >> > > > >> >> >>> >> >> >> > > > Also, regarding dedicated constraints necessitating a >> >> slave >> >> >>> >> >> restart - >> >> >>> >> >> >> > > i've >> >> >>> >> >> >> > > > pondered moving dedicated machine management to the >> >> >>> scheduler >> >> >>> >> for >> >> >>> >> >> >> > similar >> >> >>> >> >> >> > > > purposes. There's not really much forcing that >> >> behavior to >> >> >>> be >> >> >>> >> >> >> managed >> >> >>> >> >> >> > > with >> >> >>> >> >> >> > > > a slave attribute. >> >> >>> >> >> >> > > > >> >> >>> >> >> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois < >> >> >>> >> j...@conductant.com >> >> >>> >> >> >> <javascript:;> >> >> >>> >> >> >> > > <javascript:;>> wrote: >> >> >>> >> >> >> > > > >> >> >>> >> >> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko < >> >> >>> >> >> >> ma...@apache.org <javascript:;> >> >> >>> >> >> >> > > <javascript:;>> >> >> >>> >> >> >> > > >> wrote: >> >> >>> >> >> >> > > >> >> >> >>> >> >> >> > > >>> Has anyone explored an idea of having a >> non-exclusive >> >> (wrt >> >> >>> >> job >> >> >>> >> >> >> role) >> >> >>> >> >> >> > > >>> dedicated constraint in Aurora before? >> >> >>> >> >> >> > > >> >> >> >>> >> >> >> > > >> >> >> >>> >> >> >> > > >>> We do have a dedicated constraint now but it >> assumes >> >> a 1:1 >> >> >>> >> >> >> > > >>> relationship between a job role and a slave >> attribute >> >> [1]. >> >> >>> >> For >> >> >>> >> >> >> > > >>> example: a 'www-data/prod/hello' job with a >> dedicated >> >> >>> >> >> constraint of >> >> >>> >> >> >> > > >>> 'dedicated': 'www-data/hello' may only be pinned >> to a >> >> >>> >> particular >> >> >>> >> >> >> set >> >> >>> >> >> >> > > >>> of slaves if all of them have 'www-data/hello' >> >> attribute >> >> >>> >> set. No >> >> >>> >> >> >> > other >> >> >>> >> >> >> > > >>> role tasks will be able to land on those slaves >> unless >> >> >>> their >> >> >>> >> >> >> > > >>> 'role/name' pair is added into the slave attribute >> >> set. >> >> >>> >> >> >> > > >>> >> >> >>> >> >> >> > > >>> The above is very limiting as it prevents carving >> out >> >> >>> subsets >> >> >>> >> >> of a >> >> >>> >> >> >> > > >>> shared pool cluster to be used by multiple roles at >> >> the >> >> >>> same >> >> >>> >> >> time. >> >> >>> >> >> >> > > >>> Would it make sense to have a free-form dedicated >> >> >>> constraint >> >> >>> >> not >> >> >>> >> >> >> > bound >> >> >>> >> >> >> > > >>> to a particular role? Multiple jobs could then use >> >> this >> >> >>> type >> >> >>> >> of >> >> >>> >> >> >> > > >>> constraint dynamically without modifying the slave >> >> command >> >> >>> >> line >> >> >>> >> >> >> (and >> >> >>> >> >> >> > > >>> requiring slave restart). >> >> >>> >> >> >> > > >> >> >> >>> >> >> >> > > >> Can't this just be any old Constraint (not named >> >> >>> "dedicated"). >> >> >>> >> >> In >> >> >>> >> >> >> > other >> >> >>> >> >> >> > > >> words, doesn't this code already deal with >> >> non-dedicated >> >> >>> >> >> >> constraints?: >> >> >>> >> >> >> > > >> >> >> >>> >> >> >> > > >> >> >> >>> >> >> >> > > >> >> >>> >> >> >> > >> >> >>> >> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >>> >> >> >> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197 >> >> >>> >> >> >> > > >> >> >> >>> >> >> >> > > >> >> >> >>> >> >> >> > > >>> This could be quite useful for experimenting >> purposes >> >> >>> (e.g. >> >> >>> >> >> >> different >> >> >>> >> >> >> > > >>> host OS) or to target a different hardware offering >> >> (e.g. >> >> >>> >> >> GPUs). In >> >> >>> >> >> >> > > >>> other words, only those jobs that explicitly >> opt-in to >> >> >>> >> >> participate >> >> >>> >> >> >> in >> >> >>> >> >> >> > > >>> an experiment or hw offering would be landing on >> that >> >> >>> slave >> >> >>> >> set. >> >> >>> >> >> >> > > >>> >> >> >>> >> >> >> > > >>> Thanks, >> >> >>> >> >> >> > > >>> Maxim >> >> >>> >> >> >> > > >>> >> >> >>> >> >> >> > > >>> [1]- >> >> >>> >> >> >> > > >> >> >> >>> >> >> >> > > >> >> >>> >> >> >> > >> >> >>> >> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >>> >> >> >> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276 >> >> >>> >> >> >> > > >> >> >> >>> >> >> >> > > >> >> >> >>> >> >> >> > > >> >> >> >>> >> >> >> > > >> -- >> >> >>> >> >> >> > > >> John Sirois >> >> >>> >> >> >> > > >> 303-512-3301 >> >> >>> >> >> >> > > >> >> >> >>> >> >> >> > > >> >> >>> >> >> >> > >> >> >>> >> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >>> >> >> >>