John Omernik created MYRIAD-179:
-----------------------------------
Summary: Support Revocable resources in Mesos
Key: MYRIAD-179
URL: https://issues.apache.org/jira/browse/MYRIAD-179
Project: Myriad
Issue Type: Improvement
Components: Scheduler
Affects Versions: Myriad 0.1.0
Reporter: John Omernik
Mesos has introduced revocable resources. Based on my reading of things,
Myriad would be an awesome use case for over subscription, especially when you
combine it with the Fine Grain Scaling (FGS).
Based on what I've read on oversubscription, if Myriad was aware of
oversubscription, we could have Myriad be smart about various Yarn containers.
Have some jobs that may be production jobs, be tagged in such a way that they
could run on non-revocable resources, but we could have other yarn jobs with
certain users/flags, especially in FGS mode, be submitted using the revocable
resources. This would be exceptionally powerful for big map reduce jobs etc.
These are the jobs that would be adhoc in nature, and in addition to not using
resources when no jobs are running, the node managers, when they did run
certain jobs would run on the revocable resources so they could be killed if
needed.
I am speaking now not from a Dev perspective, so this may be a lot harder than
it seems, I am just trying to outline use cases.
Another use case (I think both are very valid and worth pursuing) would be
once we have the the multi-tenancy built in, have a whole myriad framework
dedicated to adhoc type jobs, and have another myriad framework dedicated to
production jobs. These adhoc jobs could be setup in such a way that all
submissions would be run with revocable resources. Thus being appropriate for
dev work, or other non production type jobs. Obviously this hinges on being
able to run two Myriad clusters on the same Mesos cluster.
The other thing, is a whole frame was set to be revocable resources, we'd have
to ensure the resource manager was running on non-revocable resources... while
containers for Yarn jobs can be killed, we don't want the whole framework to
die.
I see use cases for both, this just seems to add another layer of awesome
flexibility as it pertains to jobs on the cluster.
I'd be interested in flushing this idea out more with the dev team.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)