Something similar was proposed on a Dynamic Reservations review and there
is a ticket for it here <https://issues.apache.org/jira/browse/AURORA-173>.

I think it is feasible, but it is important to note that this is large
change because we are going to move Aurora from first fit to some other
algorithm.

If we do this we need to ensure it scales to very large clusters and
ensures reasonably low latency in assigning tasks to offers.

I support the idea of "spread", but it would need to be after a change to
the scheduling algorithm.

On Mon, Mar 6, 2017 at 11:44 AM, Mauricio Garavaglia <
mauriciogaravag...@gmail.com> wrote:

> Hello!
>
> I have a job that have multiple instances (>100) that'd I like to spread
> across the hosts in a cluster. Using a constraint such as "limit=host:1"
> doesn't work quite well, as I have more instances than nodes.
>
> As a workaround I increased the limit value to something like
> ceil(instances/nodes). But now the problem happens if a bunch of nodes go
> down (think a whole rack dies) because the instances will not run until
> them are back, even though we may have spare capacity on the rest of the
> hosts that we'd like to use. In that scenario, the job availability may be
> affected because it's running with fewer instances than expected. On a
> smaller scale, the former approach would also apply if you want to spread
> tasks in racks or availability zones. I'd like to have one instance of a
> job per rack (failure domain) but in the case of it going down, the
> instance can be spawn on a different rack.
>
> I thought we could have a scheduling constraint to "spread" instances
> across a particular host attribute; instead of vetoing an offer right away
> we check where the other instances of a task are running, looking for a
> particular attribute of the host. We try to maximize the different values
> of a particular attribute (rack, hostname, etc) on the task instances
> assignment.
>
> what do you think? did something like this came up in the past? is it
> feasible?
>
>
> Mauricio
>
> --
> Zameer Manji
>

Reply via email to