Why not just add a flag to disable the admission control logic in RMs? This
same concern came up in the Kubernetes-Mesos framework, which uses a
similar "placeholder task" architecture to grow/shrink the executor's
container as new tasks/pods are launched. We spoke to the K8s team, and
they agreed that the admission control check is not critical to the
functionality of their API server (task launch API), and it was kept behind
a flag.
I know we don't want to depend on forks of either project, but we can push
changes into Mesos/YARN when necessary.

On Thu, Jul 9, 2015 at 1:59 PM, Santosh Marella <[email protected]>
wrote:

> With hadoop-2.7, RM rejects app submissions when the capacity required to
> run the app master exceeds the cluster capacity. Fine Grained Scaling (FGS)
> is effected by the above problem. This is because, FGS sets the Node
> Manager's capacity to (0G,0CPU) when the NodeManager registers with RM and
> expands NM's capacity with resource offers from mesos. Thus, as each NM's
> capacity is set to (0G,0CPU), the "cluster capacity" stays at (0G,0CPU)
> causing the submitted apps to be rejected by RM. Although FGS expands the
> NM's capacity with mesos offers, the probability of the cluster capacity
> exceeding the AM container's capacity at the instant the app is submitted
> is still very low.
>
> Couple of options were evaluated to fix the above problem:
>
> *Option #1*
> - Let FGS not set NM's capacity to (0G,0CPU) during NM's registration with
> RM. Let FGS use mesos offers to expand NM's capacity beyond it's initial
> capacity (this is what FGS does already). When the mesos offered capacity
> is used/relinquished by Myriad, the NM's capacity is brought down to it's
> initial capacity.
>
> Pros:
>   - App submissions won't be rejected as NMs always have certain minimum
> capacity (== profile size).
>   - NMs capacities are flexible. NMs start with some initial capacity, grow
> in size with mesos offers and shrink back to the initial capacity.
>
> Cons:
>   - Hard to implement. The main problem is this:
>    Let's say an NM registered with RM with an initial capacity of (3G,2CPU)
> and Myriad subsequently receives a new offer worth (3G,1CPU). If Myriad
> sets the NM's capacity to (6G,3CPU) and allow RM to perform scheduling,
> then RM can potentially allocate 3 containers of (2G,1CPU) each. Once the
> containers are allocated, Myriad needs to figure out which of these
> containers are
>       a) allocated purely due to NM's initial capacity.
>       b) allocated purely due to additional mesos offers.
>       c) allocated purely due to combining NM's initial capacity with
> additional mesos offers.
>
>     (c) is especially complex, since Myriad has to figure out the partial
> resources consumed from the mesos offers and hold on to these resources as
> long as the YARN containers utilizing these resources are alive.
>
> *Option #2*
> 1. Introduce the notion of a new "zero" profile for NMs. NMs launched with
> this profile register with RM with (0G,0CPU). Existing profile definitions
> (low/medium/high) are left intact.
> 2. Allow FGS to be applicable only if a NM registers with (0G,0CPU)
> capacity. With this, all the containers allocated to a zero profile NM are
> always due to resources offered by mesos.
> 3. Let Myriad start a configured number of NMs (default==1) with a
> configured profile (default==low). This will help with "cluster capacity"
> to never be (0G,0CPU) and prevent rejection of apps.
>
> Pros:
>   - App submissions won't be rejected as the "cluster capacity" is never
> (0G,0CPU).
>   - YARN cluster always would have certain minimum capacity (== sum of
> capacities of NMs launched with non-zero profiles).
>   - YARN cluster capacity remains flexible, since the non-zero NMs grow and
> shrink in size.
>
> Cons:
>   - Not a huge con, but one concern is that since some NMs are of fixed
> size and some NMs are flexible, admin might want to be able to control the
> NM placement wisely. We already have a issue raised to track this, perhaps
> for a different context. But it's certainly applicable here as well. The
> issue is: https://github.com/mesos/myriad/issues/105
>
> I tried Option#1 during last week and abandoned it for it's complexity. I
> started implementing #2 (Point 3 above is still pending).
>
> I'm happy to include any feedback from folks before sending out the code
> for review.
>
> Thanks,
> Santosh
>

Reply via email to