Myriad DCOS Certification Checklist

2015-07-10 Thread Adam Bordelon
Now, as Mesosphere ramps up our DCOS Service Certification program, I would
like to share this checklist and the linked DCOS Service/CLI Specifications
with the Myriad mailing list. (Ping me with your preferred email if you
want checklist edit access.)

https://docs.google.com/document/d/1gr5hgFHgh2ZBAF4GkZNph8Y1idF_UsyiwGaImXsa6vM/edit

Many of these items are general requirements/recommendations for any
production-ready Mesos framework, and many of them are already satisfied by
Myriad or are currently in work. I know I gave some vague recommendations
before, but now it's all concrete and documented. Dig in! :)

Cheers,
-Adam-


Re: Fine Grained Scaling and Hadoop-2.7.

2015-07-10 Thread Santosh Marella
 a) Give the executor at least a minimal 0.01cpu, 1MB RAM

Myriad does this already. The problem is not with respect to executor's
capacity.

 b) ... I don't think I understand your zero profile use case

Let's take an example. Let's say the low profile corresponds to
(2G,1CPU). When
Myriad wants to launch a NM with low profile, it waits for a mesos offer
that can
hold an executor + a java process for NM + a (2G,1CPU) capacity that NM
can advertise to RM for launching future YARN containers. With CGS,
when NM registers with RM, YARN scheduler believes the NM has (2G,1CPU)
and hence can allocate containers worth (2G,1CPU) when apps require
containers.

With FGS, YARN scheduler believes NM has (0G,0CPU). This is because, FGS
intercepts NM's registration with RM and sets NM's advertised capacity to
(0G,0CPU),
although NM has originally started with (2G,1CPU).  At this point, YARN
scheduler
cannot allocate containers to this NM. Subsequently, when mesos offers
resources
on the same slave node, FGS increases the capacity of the NM and notifies RM
that NM now has capacity available. For e.g. if (5G,4CPU) are offered to
Myriad,
then FGS notifies RM that the NM now has (5G,4CPU). RM can now allocate
containers worth (5G,4CPU) for this NM. If you now count the total
resources Myriad has consumed from the given slave node, we observe that
Myriad
never utilizes the (2G,1CPU) [low profile size] that was obtained at NM's
launch time.
The notion of a zero profile tries to eliminate this wastage by allowing
NM to
be launched with an advertisable capacity of (0G,0CPU) in the first place.

Why does FGS change NM's initial capacity from (2G,1CPU) to (0G,0CPU)?
That's the way it had been until now, but it need not be. FGS can choose to
not reset
NM's capacity to (0G,0CPU) and instead allow NM to grow beyond initial
capacity of
(2G,1CPU) and shrink back to (2G,1CPU). I tried this approach recently, but
there
are other problems if we do that (mentioned under option#1 in my first
email) that
seemed more complex than going with a zero profile.

 c)... We should still investigate pushing a disable flag into YARN.
Absolutely. It totally makes sense to turn off admission restriction
for auto-scaling YARN clusters.

FWIW, I will be sending out a PR shortly from my private issue_14 branch
with the changes I made so far. Comments/suggestions are welcome!

Thanks,
Santosh

On Fri, Jul 10, 2015 at 11:44 AM, Adam Bordelon a...@mesosphere.io wrote:

 a) Give the executor at least a minimal 0.01cpu, 1MB RAM, since the
 executor itself will use some resources, and Mesos gets confused when the
 executor claims no resources. See
 https://issues.apache.org/jira/browse/MESOS-1807
 b) I agree 100% with needing a way to enable/disable FGS vs. CGS, but I
 don't think I understand your zero profile use case. I'd recommend going
 with a simple enable/disable flag for the MVP, and then we can extend it
 later if/when necessary.
 c) Interesting. Seems like a hacky workaround for the admission control
 problem, but I'm intrigued by its complexities and capabilities for other
 scenarios. We should still investigate pushing a disable flag into YARN.
  YARN-2604, YARN-3079. YARN-2604 seems to have been added because of a
  genuine problem where an app's AM container size exceeds the size of the
  largest NM node in the cluster.
 This still needs a way to be disabled, because an auto-scaling Hadoop
 cluster wouldn't worry about insufficient capacity. It would just make
 more.

 On Fri, Jul 10, 2015 at 11:13 AM, Santosh Marella smare...@maprtech.com
 wrote:

  Good point. YARN seems to have added this admission control as part of
  YARN-2604, YARN-3079. YARN-2604 seems to have been added because of a
  genuine problem where an app's AM container size exceeds the size of the
  largest NM node in the cluster. They also have a configurable interval
 that
  controls for how long should the admission control be relaxed after RM's
  startup
 (yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms).
  This was added to avoid rejection of apps submitted after RM (re)starts
 and
  before any NMs register with RM.
 
  One option is to have a larger value for the above configuration
 parameter
  for Myriad based YARN clusters. However, it might be worth to see in
 detail
  the effects of doing that, since the same config param is also used in
  work preserving RM restart feature.
 
  Another option is to add a flag to disable admission control in RM and
 push
  the change into YARN.
 
  In addition to (or irrespective of) the above, I think the following
  problems should still be fixed in Myriad:
  a. FGS shouldn't set NM's capacity to (0G,0CPU) during registration:
  This is because, if a NM is launched with a medium profile and FGS sets
  it's capacity to (0G,0CPU), RM will never schedule containers on this NM
  unless FGS expands the capacity with additional mesos offers.
 Essentially,
  the capacity used for launching the NM will not be utilized at