Re: schedule task instances spreading them based on a host attribute.

Meghdoot bhattacharya Thu, 23 Mar 2017 23:32:34 -0700

Fenzo integration should be considered.

Old thread
http://markmail.org/message/bjifjyvhvs2en3ts


If no volunteers by summer would take it up.

Thx

> On Mar 23, 2017, at 5:14 PM, Zameer Manji <zma...@apache.org> wrote:
> 
> Hey,
> 
> Sorry for the late reply.
> 
> It is possible to make this configurable. For example we could just
> implement multiple algorithms and switch between them using different
> flags. If the flag value is just a class on the classpath that implements
> an interface, it can be 100% pluggable.
> 
> The primary part of the scheduling code is `TaskScheduler` and
> `TaskAssigner`.
> 
> `TaskScheduler` receives requests to schedule tasks and does some
> validation and preparation. `TaskAssigner` implements the first fit
> algorithm.
> 
> However, I feel the best move for the project would be to move away from
> first fit, to support soft constraints. I think it is a very valid feature
> request and I believe it can be done without degradation performance.
> Ideally, we should just use an existing Java library that implements a well
> known algorithm. For example, Netflix's Fenzo
> <https://github.com/Netflix/Fenzo> could be used here.
> 
> On Wed, Mar 15, 2017 at 11:10 AM, Mauricio Garavaglia <
> mauriciogaravag...@gmail.com> wrote:
> 
>> Hi,
>> 
>> Rather than changing the scheduling algorithm, I think we should open to
>> support multiple algorithms. First-fit is certainly a great solution for
>> humongous clusters with homogeneous workloads; but for smaller clusters we
>> can make have more optimized scheduling without sacrificing scheduling
>> performance.
>> 
>> How difficult do you think it would be to start exploring that option?
>> Haven't looked into the scheduling side of the code :)
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> On Mon, Mar 6, 2017 at 2:57 PM, Zameer Manji <zma...@apache.org> wrote:
>>> 
>>> Something similar was proposed on a Dynamic Reservations review and there
>>> is a ticket for it here <https://issues.apache.org/
>> jira/browse/AURORA-173
>>>> .
>>> 
>>> I think it is feasible, but it is important to note that this is large
>>> change because we are going to move Aurora from first fit to some other
>>> algorithm.
>>> 
>>> If we do this we need to ensure it scales to very large clusters and
>>> ensures reasonably low latency in assigning tasks to offers.
>>> 
>>> I support the idea of "spread", but it would need to be after a change to
>>> the scheduling algorithm.
>>> 
>>> On Mon, Mar 6, 2017 at 11:44 AM, Mauricio Garavaglia <
>>> mauriciogaravag...@gmail.com> wrote:
>>> 
>>>> Hello!
>>>> 
>>>> I have a job that have multiple instances (>100) that'd I like to
>> spread
>>>> across the hosts in a cluster. Using a constraint such as
>> "limit=host:1"
>>>> doesn't work quite well, as I have more instances than nodes.
>>>> 
>>>> As a workaround I increased the limit value to something like
>>>> ceil(instances/nodes). But now the problem happens if a bunch of nodes
>> go
>>>> down (think a whole rack dies) because the instances will not run until
>>>> them are back, even though we may have spare capacity on the rest of
>> the
>>>> hosts that we'd like to use. In that scenario, the job availability may
>>> be
>>>> affected because it's running with fewer instances than expected. On a
>>>> smaller scale, the former approach would also apply if you want to
>> spread
>>>> tasks in racks or availability zones. I'd like to have one instance of
>> a
>>>> job per rack (failure domain) but in the case of it going down, the
>>>> instance can be spawn on a different rack.
>>>> 
>>>> I thought we could have a scheduling constraint to "spread" instances
>>>> across a particular host attribute; instead of vetoing an offer right
>>> away
>>>> we check where the other instances of a task are running, looking for a
>>>> particular attribute of the host. We try to maximize the different
>> values
>>>> of a particular attribute (rack, hostname, etc) on the task instances
>>>> assignment.
>>>> 
>>>> what do you think? did something like this came up in the past? is it
>>>> feasible?
>>>> 
>>>> 
>>>> Mauricio
>>>> 
>>>> --
>>>> Zameer Manji
>>>> 
>>> 
>> 
>> --
>> Zameer Manji
>>

Re: schedule task instances spreading them based on a host attribute.

Reply via email to