John Omernik created MYRIAD-179:
-----------------------------------

             Summary: Support Revocable resources in Mesos
                 Key: MYRIAD-179
                 URL: https://issues.apache.org/jira/browse/MYRIAD-179
             Project: Myriad
          Issue Type: Improvement
          Components: Scheduler
    Affects Versions: Myriad 0.1.0
            Reporter: John Omernik


Mesos has introduced revocable resources.   Based on my reading of things, 
Myriad would be an awesome use case for over subscription, especially when you 
combine it with the Fine Grain Scaling (FGS).  

Based on what I've read on oversubscription, if Myriad was aware of 
oversubscription, we could have Myriad be smart about various Yarn containers. 
Have some jobs that may be production jobs, be tagged in such a way that they 
could run on non-revocable resources, but we could  have other yarn jobs with 
certain users/flags, especially in FGS mode, be submitted using the revocable 
resources. This would be exceptionally powerful for big map reduce jobs etc. 

These are the jobs that would be adhoc in nature, and in addition to not using 
resources when no jobs are running, the node managers, when they did run 
certain jobs would run on the revocable resources so they could be killed if 
needed.  

I am speaking now not from a Dev perspective, so this may be a lot harder than 
it seems, I am just trying to outline use cases. 

Another use case (I think both are very valid and worth pursuing)  would be 
once we have the the multi-tenancy built in, have a whole myriad framework 
dedicated to adhoc type jobs, and have another myriad framework dedicated to 
production jobs.  These adhoc jobs could be setup in such a way that  all 
submissions would be run with revocable resources. Thus being appropriate for 
dev work, or other non production type jobs.  Obviously this hinges on being 
able to run two Myriad clusters on the same Mesos cluster. 

The other thing, is a whole frame was set to be revocable resources, we'd have 
to ensure the resource manager was running on non-revocable resources... while 
containers for Yarn jobs can be killed, we don't want the whole framework to 
die.  

I see use cases for both, this just seems to add another layer of awesome 
flexibility as it pertains to jobs on the cluster. 

I'd be interested in flushing this idea out more with the dev team. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to