Re: Fine Grained Scaling and Hadoop-2.7.

Adam Bordelon Fri, 10 Jul 2015 11:44:56 -0700

a) Give the executor at least a minimal 0.01cpu, 1MB RAM, since the
executor itself will use some resources, and Mesos gets confused when the
executor claims no resources. See
https://issues.apache.org/jira/browse/MESOS-1807
b) I agree 100% with needing a way to enable/disable FGS vs. CGS, but I
don't think I understand your "zero profile" use case. I'd recommend going
with a simple enable/disable flag for the MVP, and then we can extend it
later if/when necessary.
c) Interesting. Seems like a hacky workaround for the admission control
problem, but I'm intrigued by its complexities and capabilities for other
scenarios. We should still investigate pushing a disable flag into YARN.
> YARN-2604, YARN-3079. YARN-2604 seems to have been added because of a
> genuine problem where an app's AM container size exceeds the size of the
> largest NM node in the cluster.
This still needs a way to be disabled, because an auto-scaling Hadoop
cluster wouldn't worry about insufficient capacity. It would just make more.


On Fri, Jul 10, 2015 at 11:13 AM, Santosh Marella <[email protected]>
wrote:

> Good point. YARN seems to have added this admission control as part of
> YARN-2604, YARN-3079. YARN-2604 seems to have been added because of a
> genuine problem where an app's AM container size exceeds the size of the
> largest NM node in the cluster. They also have a configurable interval that
> controls for how long should the admission control be relaxed after RM's
> startup (yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms).
> This was added to avoid rejection of apps submitted after RM (re)starts and
> before any NMs register with RM.
>
> One option is to have a larger value for the above configuration parameter
> for Myriad based YARN clusters. However, it might be worth to see in detail
> the effects of doing that, since the same config param is also used in
> "work preserving RM restart" feature.
>
> Another option is to add a flag to disable admission control in RM and push
> the change into YARN.
>
> In addition to (or irrespective of) the above, I think the following
> problems should still be fixed in Myriad:
> a. FGS shouldn't set NM's capacity to (0G,0CPU) during registration:
> This is because, if a NM is launched with a "medium" profile and FGS sets
> it's capacity to (0G,0CPU), RM will never schedule containers on this NM
> unless FGS expands the capacity with additional mesos offers. Essentially,
> the capacity used for launching the NM will not be utilized at all.
> On the other hand, not setting the capacity to (0G,0CPU) is also a problem
> because once RM allocates containers, FGS can't (easily) tell whether the
> containers were allocated due to NM's initial capacity or due to additional
> offers received from Mesos.
>
> b. Configuration to enable/disable FGS:
> Currently, there is no configuration/semantics that control whether Myriad
> uses coarse grained scaling or fine grained scaling. If you run Myriad off
> of "phase1" branch, you get coarse grained scaling (CGS). If you run off of
> "branch_14", you get FGS. As we want branch_14 to be merged into phase1 at
> some point, we need to think of a way to enable/disable FGS. One option
> might be to a configuration that tells whether CGS should be enabled or FGS
> should be enabled. However, I feel both of these features are pretty useful
> and a co-existence of both would be ideal. Hence, introducing a new "zero
> profile" (or we should name it "verticallyScalable" profile or similar) and
> allowing FGS to be applicable only to this profile helps us define a way
> where admin can use just FGS or just CGS or a combination of both.
>
> c. Specify (profiles, instances) at startup:
> Currently, "flexup" is the only way to add more NMs. It's convenient make
> the number of instances of each profile configurable in .yml file. If admin
> chooses to have a few NMs with FGS and a few with CGS, it's a lot easier to
> specify it before starting RM. Myriad could exploit this configuration to
> provide a reasonable workaround to the admission control problem: enforce
> at least 1 NM of non-zero size.
>
> Thanks,
> Santosh
>
> On Fri, Jul 10, 2015 at 12:32 AM, Adam Bordelon <[email protected]>
> wrote:
>
> > Why not just add a flag to disable the admission control logic in RMs?
> This
> > same concern came up in the Kubernetes-Mesos framework, which uses a
> > similar "placeholder task" architecture to grow/shrink the executor's
> > container as new tasks/pods are launched. We spoke to the K8s team, and
> > they agreed that the admission control check is not critical to the
> > functionality of their API server (task launch API), and it was kept
> behind
> > a flag.
> > I know we don't want to depend on forks of either project, but we can
> push
> > changes into Mesos/YARN when necessary.
> >
> > On Thu, Jul 9, 2015 at 1:59 PM, Santosh Marella <[email protected]>
> > wrote:
> >
> > > With hadoop-2.7, RM rejects app submissions when the capacity required
> to
> > > run the app master exceeds the cluster capacity. Fine Grained Scaling
> > (FGS)
> > > is effected by the above problem. This is because, FGS sets the Node
> > > Manager's capacity to (0G,0CPU) when the NodeManager registers with RM
> > and
> > > expands NM's capacity with resource offers from mesos. Thus, as each
> NM's
> > > capacity is set to (0G,0CPU), the "cluster capacity" stays at (0G,0CPU)
> > > causing the submitted apps to be rejected by RM. Although FGS expands
> the
> > > NM's capacity with mesos offers, the probability of the cluster
> capacity
> > > exceeding the AM container's capacity at the instant the app is
> submitted
> > > is still very low.
> > >
> > > Couple of options were evaluated to fix the above problem:
> > >
> > > *Option #1*
> > > - Let FGS not set NM's capacity to (0G,0CPU) during NM's registration
> > with
> > > RM. Let FGS use mesos offers to expand NM's capacity beyond it's
> initial
> > > capacity (this is what FGS does already). When the mesos offered
> capacity
> > > is used/relinquished by Myriad, the NM's capacity is brought down to
> it's
> > > initial capacity.
> > >
> > > Pros:
> > >   - App submissions won't be rejected as NMs always have certain
> minimum
> > > capacity (== profile size).
> > >   - NMs capacities are flexible. NMs start with some initial capacity,
> > grow
> > > in size with mesos offers and shrink back to the initial capacity.
> > >
> > > Cons:
> > >   - Hard to implement. The main problem is this:
> > >    Let's say an NM registered with RM with an initial capacity of
> > (3G,2CPU)
> > > and Myriad subsequently receives a new offer worth (3G,1CPU). If Myriad
> > > sets the NM's capacity to (6G,3CPU) and allow RM to perform scheduling,
> > > then RM can potentially allocate 3 containers of (2G,1CPU) each. Once
> the
> > > containers are allocated, Myriad needs to figure out which of these
> > > containers are
> > >       a) allocated purely due to NM's initial capacity.
> > >       b) allocated purely due to additional mesos offers.
> > >       c) allocated purely due to combining NM's initial capacity with
> > > additional mesos offers.
> > >
> > >     (c) is especially complex, since Myriad has to figure out the
> partial
> > > resources consumed from the mesos offers and hold on to these resources
> > as
> > > long as the YARN containers utilizing these resources are alive.
> > >
> > > *Option #2*
> > > 1. Introduce the notion of a new "zero" profile for NMs. NMs launched
> > with
> > > this profile register with RM with (0G,0CPU). Existing profile
> > definitions
> > > (low/medium/high) are left intact.
> > > 2. Allow FGS to be applicable only if a NM registers with (0G,0CPU)
> > > capacity. With this, all the containers allocated to a zero profile NM
> > are
> > > always due to resources offered by mesos.
> > > 3. Let Myriad start a configured number of NMs (default==1) with a
> > > configured profile (default==low). This will help with "cluster
> capacity"
> > > to never be (0G,0CPU) and prevent rejection of apps.
> > >
> > > Pros:
> > >   - App submissions won't be rejected as the "cluster capacity" is
> never
> > > (0G,0CPU).
> > >   - YARN cluster always would have certain minimum capacity (== sum of
> > > capacities of NMs launched with non-zero profiles).
> > >   - YARN cluster capacity remains flexible, since the non-zero NMs grow
> > and
> > > shrink in size.
> > >
> > > Cons:
> > >   - Not a huge con, but one concern is that since some NMs are of fixed
> > > size and some NMs are flexible, admin might want to be able to control
> > the
> > > NM placement wisely. We already have a issue raised to track this,
> > perhaps
> > > for a different context. But it's certainly applicable here as well.
> The
> > > issue is: https://github.com/mesos/myriad/issues/105
> > >
> > > I tried Option#1 during last week and abandoned it for it's
> complexity. I
> > > started implementing #2 (Point 3 above is still pending).
> > >
> > > I'm happy to include any feedback from folks before sending out the
> code
> > > for review.
> > >
> > > Thanks,
> > > Santosh
> > >
> >
>

Re: Fine Grained Scaling and Hadoop-2.7.

Reply via email to