What kind of isolation features are you using? I would like to probe a little deeper here, because this is not an ideal rationale for changing the placement algorithm. Ideally Mesos and Linux provides the right isolation technology to make this a non problem.
I understand the push for job anti-affinity (ie don't put too many kafka workers in general on one host), but I would imagine it would be for reliability reasons not for performance reasons. On Thu, Mar 30, 2017 at 12:16 PM, Rick Mangi <r...@chartbeat.com> wrote: > Performance and utilization mostly. The kafka consumers are CPU bound (and > sometimes network) and the rest of our jobs are mostly memory bound. We’ve > found that if too many consumers wind up on the same EC2 instance they > don’t perform as well. It’s hard to prove this, but the gut feeling is > pretty strong. > > > > On Mar 30, 2017, at 2:35 PM, Zameer Manji <zma...@apache.org> wrote: > > > > Rick, > > > > Can you share why it would be nice to spread out these different jobs on > > different hosts? Is it for reliability, performance, utilization, etc? > > > > On Thu, Mar 30, 2017 at 11:31 AM, Rick Mangi <r...@chartbeat.com> wrote: > > > >> Yeah, we have a dozen or so kafka consumer jobs running in our cluster, > >> each having about 40 or so instances. > >> > >> > >>> On Mar 30, 2017, at 2:06 PM, David McLaughlin <da...@dmclaughlin.com> > >> wrote: > >>> > >>> There is absolutely a need for custom hook points in the scheduler > >> (injecting default constraints to running tasks for example). I don't > think > >> users should be asked to write custom scheduling algorithms to solve the > >> problems in this thread though. There are also huge downsides to > exposing > >> the internals of scheduling as a part of a plugin API. > >>> > >>> Out of curiosity do your Kafka consumers span multiple jobs? Otherwise > >> host constraints solve that problem right? > >>> > >>>> On Mar 30, 2017, at 10:34 AM, Rick Mangi <r...@chartbeat.com> wrote: > >>>> > >>>> I think the complexity is a great rationale for having a pluggable > >> scheduling layer. Aurora is very flexible and people use it in many > >> different ways. Giving users more flexibility in how jobs are scheduled > >> seems like it would be a good direction for the project. > >>>> > >>>> > >>>>> On Mar 30, 2017, at 12:16 PM, David McLaughlin < > dmclaugh...@apache.org> > >> wrote: > >>>>> > >>>>> I think this is more complicated than multiple scheduling algorithms. > >> The > >>>>> problem you'll end up having if you try to solve this in the > Scheduling > >>>>> loop is when resources are unavailable because there are preemptible > >> tasks > >>>>> running in them, rather than hosts being down. Right now the fact > that > >> the > >>>>> task cannot be scheduled is important because it triggers preemption > >> and > >>>>> will make room. An alternative algorithm that tries at all costs to > >>>>> schedule the task in the TaskAssigner could decide to place the task > >> in a > >>>>> non-ideal slot and leave a preemptible task running instead. > >>>>> > >>>>> It's also important to think of the knock-on effects here when we > move > >> to > >>>>> offer affinity (i.e. the current Dynamic Reservation proposal). If > >> you've > >>>>> made this non-ideal compromise to get things scheduled - that > decision > >> will > >>>>> basically be permanent until the host you're on goes down. At least > >> with > >>>>> how things work now, with each scheduling attempt the job has a fresh > >>>>> chance of being put in an ideal slot. > >>>>> > >>>>>> On Thu, Mar 30, 2017 at 8:12 AM, Rick Mangi <r...@chartbeat.com> > >> wrote: > >>>>>> > >>>>>> Sorry for the late reply, but I wanted to chime in here as wanting > to > >> see > >>>>>> this feature. We run a medium size cluster (around 1000 cores) in > EC2 > >> and I > >>>>>> think we could get better usage of the cluster with more control > over > >> the > >>>>>> distribution of job instances. For example it would be nice to limit > >> the > >>>>>> number of kafka consumers running on the same physical box. > >>>>>> > >>>>>> Best, > >>>>>> > >>>>>> Rick > >>>>>> > >>>>>> > >>>>>>> On 2017-03-06 14:44 (-0400), Mauricio Garavaglia <m...@gmail.com> > >> wrote: > >>>>>>> Hello!> > >>>>>>> > >>>>>>> I have a job that have multiple instances (>100) that'd I like to > >> spread> > >>>>>>> across the hosts in a cluster. Using a constraint such as > >> "limit=host:1"> > >>>>>>> doesn't work quite well, as I have more instances than nodes.> > >>>>>>> > >>>>>>> As a workaround I increased the limit value to something like> > >>>>>>> ceil(instances/nodes). But now the problem happens if a bunch of > >> nodes > >>>>>> go> > >>>>>>> down (think a whole rack dies) because the instances will not run > >> until> > >>>>>>> them are back, even though we may have spare capacity on the rest > of > >> the> > >>>>>>> hosts that we'd like to use. In that scenario, the job availability > >> may > >>>>>> be> > >>>>>>> affected because it's running with fewer instances than expected. > On > >> a> > >>>>>>> smaller scale, the former approach would also apply if you want to > >>>>>> spread> > >>>>>>> tasks in racks or availability zones. I'd like to have one instance > >> of a> > >>>>>>> job per rack (failure domain) but in the case of it going down, > the> > >>>>>>> instance can be spawn on a different rack.> > >>>>>>> > >>>>>>> I thought we could have a scheduling constraint to "spread" > >> instances> > >>>>>>> across a particular host attribute; instead of vetoing an offer > right > >>>>>> away> > >>>>>>> we check where the other instances of a task are running, looking > >> for a> > >>>>>>> particular attribute of the host. We try to maximize the different > >>>>>> values> > >>>>>>> of a particular attribute (rack, hostname, etc) on the task > >> instances> > >>>>>>> assignment.> > >>>>>>> > >>>>>>> what do you think? did something like this came up in the past? is > >> it> > >>>>>>> feasible?> > >>>>>>> > >>>>>>> > >>>>>>> Mauricio> > >>>>>>> > >>>>>> > >>>> > >> > >> -- > >> Zameer Manji > >> > > -- > Zameer Manji >