With hadoop-2.7, RM rejects app submissions when the capacity required to
run the app master exceeds the cluster capacity. Fine Grained Scaling (FGS)
is effected by the above problem. This is because, FGS sets the Node
Manager's capacity to (0G,0CPU) when the NodeManager registers with RM and
expands NM's capacity with resource offers from mesos. Thus, as each NM's
capacity is set to (0G,0CPU), the "cluster capacity" stays at (0G,0CPU)
causing the submitted apps to be rejected by RM. Although FGS expands the
NM's capacity with mesos offers, the probability of the cluster capacity
exceeding the AM container's capacity at the instant the app is submitted
is still very low.

Couple of options were evaluated to fix the above problem:

*Option #1*
- Let FGS not set NM's capacity to (0G,0CPU) during NM's registration with
RM. Let FGS use mesos offers to expand NM's capacity beyond it's initial
capacity (this is what FGS does already). When the mesos offered capacity
is used/relinquished by Myriad, the NM's capacity is brought down to it's
initial capacity.

Pros:
  - App submissions won't be rejected as NMs always have certain minimum
capacity (== profile size).
  - NMs capacities are flexible. NMs start with some initial capacity, grow
in size with mesos offers and shrink back to the initial capacity.

Cons:
  - Hard to implement. The main problem is this:
   Let's say an NM registered with RM with an initial capacity of (3G,2CPU)
and Myriad subsequently receives a new offer worth (3G,1CPU). If Myriad
sets the NM's capacity to (6G,3CPU) and allow RM to perform scheduling,
then RM can potentially allocate 3 containers of (2G,1CPU) each. Once the
containers are allocated, Myriad needs to figure out which of these
containers are
      a) allocated purely due to NM's initial capacity.
      b) allocated purely due to additional mesos offers.
      c) allocated purely due to combining NM's initial capacity with
additional mesos offers.

    (c) is especially complex, since Myriad has to figure out the partial
resources consumed from the mesos offers and hold on to these resources as
long as the YARN containers utilizing these resources are alive.

*Option #2*
1. Introduce the notion of a new "zero" profile for NMs. NMs launched with
this profile register with RM with (0G,0CPU). Existing profile definitions
(low/medium/high) are left intact.
2. Allow FGS to be applicable only if a NM registers with (0G,0CPU)
capacity. With this, all the containers allocated to a zero profile NM are
always due to resources offered by mesos.
3. Let Myriad start a configured number of NMs (default==1) with a
configured profile (default==low). This will help with "cluster capacity"
to never be (0G,0CPU) and prevent rejection of apps.

Pros:
  - App submissions won't be rejected as the "cluster capacity" is never
(0G,0CPU).
  - YARN cluster always would have certain minimum capacity (== sum of
capacities of NMs launched with non-zero profiles).
  - YARN cluster capacity remains flexible, since the non-zero NMs grow and
shrink in size.

Cons:
  - Not a huge con, but one concern is that since some NMs are of fixed
size and some NMs are flexible, admin might want to be able to control the
NM placement wisely. We already have a issue raised to track this, perhaps
for a different context. But it's certainly applicable here as well. The
issue is: https://github.com/mesos/myriad/issues/105

I tried Option#1 during last week and abandoned it for it's complexity. I
started implementing #2 (Point 3 above is still pending).

I'm happy to include any feedback from folks before sending out the code
for review.

Thanks,
Santosh

Reply via email to