Hi Ben,

and thank you for answering.

> > For frameworks in the same role on the other hand we choose to normalize
> > with the allocated resources
> 
> Within a role, the framework's share is evaluated using the *role*'s total
> allocation as a denominator. Were you referring to the role's total
> allocation when you said "allocated resources"?

Yes.

> I believe this was just to reflect the "total pool" we're sharing within.
> For roles, we're sharing the total cluster as a pool. For frameworks within
> a role, we're sharing the role's total allocation as a pool amongst the
> frameworks. Make sense?

Looking at the allocation loop, I see that while a role sorter uses the
actual cluster resources when generating a sorting, we only seem to
update the total in the picked framework sorter with an `add` at the end
of the allocation loop, so at the very least the "total pool" of
resources in a single role seems to lag. Should this update be moved to
the top of the loop?

> The sort ordering should be the same no matter which denominator you
> choose, since everyone gets the same denominator. i.e. 1,2,3 are ordered
> the same whether you're evaluating their share as 1/10,2/10,3/10 or
> 1/100,2/100,3/100, etc.

This seems to be only true if we have just a single resource kind. For
multiple resource kinds we are not just dealing with a single scale
factor, but will also end up comparing single-resource scales against
each other in DRF.

Here's a brief example of a cluster with two frameworks where we end up
with different DRF weights `f` depending on whether the frameworks are in
the same role or not.

- Setup:
  * cluster total: cpus:40; mem:100; disk:1000
  * cluster used:  cpus:30; mem:  2; disk:   5

  * framework 'a': used=cpus:20; mem:1; disk:1
  * framework 'b': used=cpus:10; mem:1; disk:4

- both frameworks in separate roles
  * framework 'a', role 'A'; role shares: cpus:2/4; mem:1/100; disk:1/1000; 
f=2/4
  * framework 'b', role 'B'; role shares: cpus:1/4; mem:1/100; disk:2/1000; 
f=1/4

- both frameworks in same role:
  * framework 'a': framework shares: cpus:2/3; mem:1/2; disk:1/4; f=1/2
  * framework 'b': framework shares: cpus:1/3; mem:1/2; disk:4/5; f=4/5

If each framework is in its own role we would allocate the next resource
to 'b'; if the frameworks are in the same role we would allocate to 'a'
instead. This is what I meant with

> It appears to me that by normalizing with the used resources inside a role
> we somehow bias allocations inside a role against frameworks with “unusual”
> usage vectors (relative to other frameworks in the same role). 

In this example we would penalize 'b' for having a usage vector very
different from 'a' (here: along the `disk` axis).


Benjamin

Reply via email to