a) Give the executor at least a minimal 0.01cpu, 1MB RAM, since the executor itself will use some resources, and Mesos gets confused when the executor claims no resources. See https://issues.apache.org/jira/browse/MESOS-1807 b) I agree 100% with needing a way to enable/disable FGS vs. CGS, but I don't think I understand your "zero profile" use case. I'd recommend going with a simple enable/disable flag for the MVP, and then we can extend it later if/when necessary. c) Interesting. Seems like a hacky workaround for the admission control problem, but I'm intrigued by its complexities and capabilities for other scenarios. We should still investigate pushing a disable flag into YARN. > YARN-2604, YARN-3079. YARN-2604 seems to have been added because of a > genuine problem where an app's AM container size exceeds the size of the > largest NM node in the cluster. This still needs a way to be disabled, because an auto-scaling Hadoop cluster wouldn't worry about insufficient capacity. It would just make more.
On Fri, Jul 10, 2015 at 11:13 AM, Santosh Marella <[email protected]> wrote: > Good point. YARN seems to have added this admission control as part of > YARN-2604, YARN-3079. YARN-2604 seems to have been added because of a > genuine problem where an app's AM container size exceeds the size of the > largest NM node in the cluster. They also have a configurable interval that > controls for how long should the admission control be relaxed after RM's > startup (yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms). > This was added to avoid rejection of apps submitted after RM (re)starts and > before any NMs register with RM. > > One option is to have a larger value for the above configuration parameter > for Myriad based YARN clusters. However, it might be worth to see in detail > the effects of doing that, since the same config param is also used in > "work preserving RM restart" feature. > > Another option is to add a flag to disable admission control in RM and push > the change into YARN. > > In addition to (or irrespective of) the above, I think the following > problems should still be fixed in Myriad: > a. FGS shouldn't set NM's capacity to (0G,0CPU) during registration: > This is because, if a NM is launched with a "medium" profile and FGS sets > it's capacity to (0G,0CPU), RM will never schedule containers on this NM > unless FGS expands the capacity with additional mesos offers. Essentially, > the capacity used for launching the NM will not be utilized at all. > On the other hand, not setting the capacity to (0G,0CPU) is also a problem > because once RM allocates containers, FGS can't (easily) tell whether the > containers were allocated due to NM's initial capacity or due to additional > offers received from Mesos. > > b. Configuration to enable/disable FGS: > Currently, there is no configuration/semantics that control whether Myriad > uses coarse grained scaling or fine grained scaling. If you run Myriad off > of "phase1" branch, you get coarse grained scaling (CGS). If you run off of > "branch_14", you get FGS. As we want branch_14 to be merged into phase1 at > some point, we need to think of a way to enable/disable FGS. One option > might be to a configuration that tells whether CGS should be enabled or FGS > should be enabled. However, I feel both of these features are pretty useful > and a co-existence of both would be ideal. Hence, introducing a new "zero > profile" (or we should name it "verticallyScalable" profile or similar) and > allowing FGS to be applicable only to this profile helps us define a way > where admin can use just FGS or just CGS or a combination of both. > > c. Specify (profiles, instances) at startup: > Currently, "flexup" is the only way to add more NMs. It's convenient make > the number of instances of each profile configurable in .yml file. If admin > chooses to have a few NMs with FGS and a few with CGS, it's a lot easier to > specify it before starting RM. Myriad could exploit this configuration to > provide a reasonable workaround to the admission control problem: enforce > at least 1 NM of non-zero size. > > Thanks, > Santosh > > On Fri, Jul 10, 2015 at 12:32 AM, Adam Bordelon <[email protected]> > wrote: > > > Why not just add a flag to disable the admission control logic in RMs? > This > > same concern came up in the Kubernetes-Mesos framework, which uses a > > similar "placeholder task" architecture to grow/shrink the executor's > > container as new tasks/pods are launched. We spoke to the K8s team, and > > they agreed that the admission control check is not critical to the > > functionality of their API server (task launch API), and it was kept > behind > > a flag. > > I know we don't want to depend on forks of either project, but we can > push > > changes into Mesos/YARN when necessary. > > > > On Thu, Jul 9, 2015 at 1:59 PM, Santosh Marella <[email protected]> > > wrote: > > > > > With hadoop-2.7, RM rejects app submissions when the capacity required > to > > > run the app master exceeds the cluster capacity. Fine Grained Scaling > > (FGS) > > > is effected by the above problem. This is because, FGS sets the Node > > > Manager's capacity to (0G,0CPU) when the NodeManager registers with RM > > and > > > expands NM's capacity with resource offers from mesos. Thus, as each > NM's > > > capacity is set to (0G,0CPU), the "cluster capacity" stays at (0G,0CPU) > > > causing the submitted apps to be rejected by RM. Although FGS expands > the > > > NM's capacity with mesos offers, the probability of the cluster > capacity > > > exceeding the AM container's capacity at the instant the app is > submitted > > > is still very low. > > > > > > Couple of options were evaluated to fix the above problem: > > > > > > *Option #1* > > > - Let FGS not set NM's capacity to (0G,0CPU) during NM's registration > > with > > > RM. Let FGS use mesos offers to expand NM's capacity beyond it's > initial > > > capacity (this is what FGS does already). When the mesos offered > capacity > > > is used/relinquished by Myriad, the NM's capacity is brought down to > it's > > > initial capacity. > > > > > > Pros: > > > - App submissions won't be rejected as NMs always have certain > minimum > > > capacity (== profile size). > > > - NMs capacities are flexible. NMs start with some initial capacity, > > grow > > > in size with mesos offers and shrink back to the initial capacity. > > > > > > Cons: > > > - Hard to implement. The main problem is this: > > > Let's say an NM registered with RM with an initial capacity of > > (3G,2CPU) > > > and Myriad subsequently receives a new offer worth (3G,1CPU). If Myriad > > > sets the NM's capacity to (6G,3CPU) and allow RM to perform scheduling, > > > then RM can potentially allocate 3 containers of (2G,1CPU) each. Once > the > > > containers are allocated, Myriad needs to figure out which of these > > > containers are > > > a) allocated purely due to NM's initial capacity. > > > b) allocated purely due to additional mesos offers. > > > c) allocated purely due to combining NM's initial capacity with > > > additional mesos offers. > > > > > > (c) is especially complex, since Myriad has to figure out the > partial > > > resources consumed from the mesos offers and hold on to these resources > > as > > > long as the YARN containers utilizing these resources are alive. > > > > > > *Option #2* > > > 1. Introduce the notion of a new "zero" profile for NMs. NMs launched > > with > > > this profile register with RM with (0G,0CPU). Existing profile > > definitions > > > (low/medium/high) are left intact. > > > 2. Allow FGS to be applicable only if a NM registers with (0G,0CPU) > > > capacity. With this, all the containers allocated to a zero profile NM > > are > > > always due to resources offered by mesos. > > > 3. Let Myriad start a configured number of NMs (default==1) with a > > > configured profile (default==low). This will help with "cluster > capacity" > > > to never be (0G,0CPU) and prevent rejection of apps. > > > > > > Pros: > > > - App submissions won't be rejected as the "cluster capacity" is > never > > > (0G,0CPU). > > > - YARN cluster always would have certain minimum capacity (== sum of > > > capacities of NMs launched with non-zero profiles). > > > - YARN cluster capacity remains flexible, since the non-zero NMs grow > > and > > > shrink in size. > > > > > > Cons: > > > - Not a huge con, but one concern is that since some NMs are of fixed > > > size and some NMs are flexible, admin might want to be able to control > > the > > > NM placement wisely. We already have a issue raised to track this, > > perhaps > > > for a different context. But it's certainly applicable here as well. > The > > > issue is: https://github.com/mesos/myriad/issues/105 > > > > > > I tried Option#1 during last week and abandoned it for it's > complexity. I > > > started implementing #2 (Point 3 above is still pending). > > > > > > I'm happy to include any feedback from folks before sending out the > code > > > for review. > > > > > > Thanks, > > > Santosh > > > > > >
