Re: schedule task instances spreading them based on a host attribute.

David McLaughlin Thu, 30 Mar 2017 11:06:51 -0700

There is absolutely a need for custom hook points in the scheduler (injecting 
default constraints to running tasks for example). I don't think users should 
be asked to write custom scheduling algorithms to solve the problems in this 
thread though. There are also huge downsides to exposing the internals of 
scheduling as a part of a plugin API.


Out of curiosity do your Kafka consumers span multiple jobs? Otherwise host 
constraints solve that problem right? 

> On Mar 30, 2017, at 10:34 AM, Rick Mangi <r...@chartbeat.com> wrote:
> 
> I think the complexity is a great rationale for having a pluggable scheduling 
> layer. Aurora is very flexible and people use it in many different ways. 
> Giving users more flexibility in how jobs are scheduled seems like it would 
> be a good direction for the project.
> 
> 
>> On Mar 30, 2017, at 12:16 PM, David McLaughlin <dmclaugh...@apache.org> 
>> wrote:
>> 
>> I think this is more complicated than multiple scheduling algorithms. The
>> problem you'll end up having if you try to solve this in the Scheduling
>> loop is when resources are unavailable because there are preemptible tasks
>> running in them, rather than hosts being down. Right now the fact that the
>> task cannot be scheduled is important because it triggers preemption and
>> will make room. An alternative algorithm that tries at all costs to
>> schedule the task in the TaskAssigner could decide to place the task in a
>> non-ideal slot and leave a preemptible task running instead.
>> 
>> It's also important to think of the knock-on effects here when we move to
>> offer affinity (i.e. the current Dynamic Reservation proposal). If you've
>> made this non-ideal compromise to get things scheduled - that decision will
>> basically be permanent until the host you're on goes down. At least with
>> how things work now, with each scheduling attempt the job has a fresh
>> chance of being put in an ideal slot.
>> 
>>> On Thu, Mar 30, 2017 at 8:12 AM, Rick Mangi <r...@chartbeat.com> wrote:
>>> 
>>> Sorry for the late reply, but I wanted to chime in here as wanting to see
>>> this feature. We run a medium size cluster (around 1000 cores) in EC2 and I
>>> think we could get better usage of the cluster with more control over the
>>> distribution of job instances. For example it would be nice to limit the
>>> number of kafka consumers running on the same physical box.
>>> 
>>> Best,
>>> 
>>> Rick
>>> 
>>> 
>>>> On 2017-03-06 14:44 (-0400), Mauricio Garavaglia <m...@gmail.com> wrote:
>>>> Hello!>
>>>> 
>>>> I have a job that have multiple instances (>100) that'd I like to spread>
>>>> across the hosts in a cluster. Using a constraint such as "limit=host:1">
>>>> doesn't work quite well, as I have more instances than nodes.>
>>>> 
>>>> As a workaround I increased the limit value to something like>
>>>> ceil(instances/nodes). But now the problem happens if a bunch of nodes
>>> go>
>>>> down (think a whole rack dies) because the instances will not run until>
>>>> them are back, even though we may have spare capacity on the rest of the>
>>>> hosts that we'd like to use. In that scenario, the job availability may
>>> be>
>>>> affected because it's running with fewer instances than expected. On a>
>>>> smaller scale, the former approach would also apply if you want to
>>> spread>
>>>> tasks in racks or availability zones. I'd like to have one instance of a>
>>>> job per rack (failure domain) but in the case of it going down, the>
>>>> instance can be spawn on a different rack.>
>>>> 
>>>> I thought we could have a scheduling constraint to "spread" instances>
>>>> across a particular host attribute; instead of vetoing an offer right
>>> away>
>>>> we check where the other instances of a task are running, looking for a>
>>>> particular attribute of the host. We try to maximize the different
>>> values>
>>>> of a particular attribute (rack, hostname, etc) on the task instances>
>>>> assignment.>
>>>> 
>>>> what do you think? did something like this came up in the past? is it>
>>>> feasible?>
>>>> 
>>>> 
>>>> Mauricio>
>>>> 
>>> 
>

Re: schedule task instances spreading them based on a host attribute.

Reply via email to