Re: Scheduling stratergies for Airavata

Suresh Marru Thu, 04 Sep 2014 14:55:45 -0700

Eran, 

This is a good read and infact sounds very similar in situation (picking a well 
known solution vs writing our own). As you may recollect, Airavata’s key 
challenge is in identifying the resources which have the shortest queue time 
across many resources. And of course, it will have use cases like re-using 
cloud resources for individual jobs part of a larger workflow (a flavor of your 
thesis topic if you still remember) and so on. So my question is, are Mesos or 
Aurora’s use cases in managing a fixed set of resources, I mean the challenge 
in spreading M jobs across N resources efficiently with fair-share, varying 
memory and I/O requirements and so on? Or did you also come across examples 
which will resonate with meta-schedulers interacting with multiple lower level 
schedulers?


Thanks,
Suresh

On Sep 4, 2014, at 5:38 PM, Eran Chinthaka Withana <[email protected]> 
wrote:

> oops, sorry. Here it is:
> http://www.mail-archive.com/[email protected]/msg01417.html
> 
> Thanks,
> Eran Chinthaka Withana
> 
> 
> On Thu, Sep 4, 2014 at 2:22 PM, Suresh Marru <[email protected]> wrote:
> 
>> Hi Eran, Jijoe
>> 
>> Can you share the missing reference you indicate below?
>> 
>> Ofcourse by all means its good for Airavata to build over projects like
>> Mesos, thats my motivation for this discussion. I am not yet suggesting
>> implementing a scheduler, that will be a distraction. The meta scheduler I
>> illustrated is a mere routing to be injected into airavata job management
>> with a simple FIFO. We looking forward to hearing options from you all on
>> whats the right third party software is. Manu Singh a first year graduate
>> student at IU volunteers to do a academic study of these solutions, so will
>> appreciate pointers.
>> 
>> Suresh
>> 
>> On Sep 3, 2014, at 11:59 AM, Eran Chinthaka Withana <
>> [email protected]> wrote:
>> 
>>> Hi,
>>> 
>>> Before you go ahead and implement on your own, consider reading this mail
>>> thread[1] and looking at how frameworks like Apache Aurora does it on top
>>> of Apache Mesos. These may provide good inputs for this implementation.
>>> 
>>> (thanks to Jijoe also who provided input for this)
>>> 
>>> 
>>> 
>>> Thanks,
>>> Eran Chinthaka Withana
>>> 
>>> 
>>> On Wed, Sep 3, 2014 at 5:50 AM, Suresh Marru <[email protected]> wrote:
>>> 
>>>> Thank you all for comments and suggestions. I summarized the discussion
>> as
>>>> a implementation plan on a wiki page:
>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Metascheduler
>>>> 
>>>> If this is amenable, we can take this to dev list to plan the
>> development
>>>> in two phases. First implement the Throttle-Job in and short term and
>> then
>>>> plan the Auto-Scheduling capabilities.
>>>> 
>>>> Suresh
>>>> 
>>>> On Sep 2, 2014, at 1:50 PM, Gary E. Gorbet <[email protected]> wrote:
>>>> 
>>>>> It seems to me that among many possible functions a metascheduler (MS)
>>>> would provide, there are two separate ones that must be addressed first.
>>>> The two use cases implied are as follows.
>>>>> 
>>>>> (1) The gateway submits a group of jobs to a specified resource where
>>>> the count of jobs exceeds the resource’s queued job limit. Let’s say 300
>>>> very quick jobs are submitted, where the limit is 50 per community user.
>>>> The MS must maintain an internal queue and release jobs to the resource
>> in
>>>> groups with job counts under the limit (say, 40 at a time).
>>>>> 
>>>>> (2) The gateway submits a job or set of jobs with a flag that specifies
>>>> that Airavata choose the resource. Here, MCP or some other mechanism
>>>> arrives eventually at the specific resource that completes the job(s).
>>>>> 
>>>>> Where both uses are needed - unspecified resource and a group of jobs
>>>> with count exceeding limits - the MS action would be best defined by
>>>> knowing the definitions and mechanisms employed in the two separate
>>>> functions. For example, if MCP is employed, the initial brute force test
>>>> submissions might need to be done using the determined number of jobs
>> at a
>>>> time (e.g., 40). But the design here must adhere to design criteria
>> arrived
>>>> at for both function (1) and function (2).
>>>>> 
>>>>> In UltraScan’s case, the most immediate need is for (1). The user could
>>>> manually determine the best resource or just make a reasonable guess.
>> What
>>>> the user does not want to do is manually release jobs 40 at a time. The
>>>> gateway interface allows specification of a group of 300 jobs and the
>> user
>>>> does not care what is going on under the covers to effect the running of
>>>> all of them eventually. So, I guess I am lobbying for addressing (1)
>> first;
>>>> both to meet UltraScan’s immediate need and to elucidate the design of
>> more
>>>> sophisticated functionality.
>>>>> 
>>>>> - Gary
>>>>> 
>>>>> On Sep 2, 2014, at 12:02 PM, Suresh Marru <[email protected]> wrote:
>>>>> 
>>>>>> Hi Kenneth,
>>>>>> 
>>>>>> On Sep 2, 2014, at 12:44 PM, K Yoshimoto <[email protected]> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> The tricky thing is the need to maintain an internal queue of
>>>>>>> jobs when the Stampede queued jobs limit is reached.  If airavata
>>>>>>> has an internal representation for jobs to be submitted, I think you
>>>>>>> are most of the way there.
>>>>>> 
>>>>>> Airavata has an internal representation of jobs, but there is no good
>>>> global view of all the jobs running on a given resource for a given
>>>> community account. We are trying to fix this, once this is done, as you
>>>> say, the FIFO implementation should be straight forward.
>>>>>> 
>>>>>>> It is tricky to do resource-matching scheduling when the job mix
>>>>>>> is not known.  For example, the scheduler does not know whether
>>>>>>> to preserve memory vs cores when deciding where to place a job.
>>>>>>> Also, the interactions of the gateway scheduler and the local
>>>>>>> schedulers may be complicated to predict.
>>>>>>> 
>>>>>>> Fair share is probably not a good idea.  In practice, it tends
>>>>>>> to disrupt the other scheduling policies such that one group is
>>>>>>> penalized and the others don't run much earlier.
>>>>>> 
>>>>>> Interesting. What do you think of the capacity based scheduling
>>>> algorithm (linked below)?
>>>>>> 
>>>>>>> 
>>>>>>> One option is to maintain the gateway job queue internally,
>>>>>>> then use the MCP brute force approach: submit to all resources,
>>>>>>> then cancel after the first job start.  You may also want to
>>>>>>> allow the gateway to set per-resource policy limits on
>>>>>>> number of jobs, job duration, job core size, SUs, etc.
>>>>>> 
>>>>>> MCP is something we should try. The limits per gateway per resource
>>>> exists, but we need to exercise these capabilities.
>>>>>> 
>>>>>> Suresh
>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Sep 02, 2014 at 07:50:12AM -0400, Suresh Marru wrote:
>>>>>>>> Hi All,
>>>>>>>> 
>>>>>>>> Need some guidance on identifying a scheduling strategy and a
>>>> pluggable third party implementation for airavata scheduling needs. For
>>>> context let me describe the use cases for scheduling within airavata:
>>>>>>>> 
>>>>>>>> * If we gateway/user is submitting a series of jobs, airavata is
>>>> currently not throttling them and sending them to compute clusters (in a
>>>> FIFO way). Resources enforce per user job limit within a queue and
>> ensure
>>>> fair use of the clusters ((example: stampede allows 50 jobs per user in
>> the
>>>> normal queue [1]). Airavata will need to implement queues and throttle
>> jobs
>>>> respecting the max-job-per-queue limits of a underlying resource queue.
>>>>>>>> 
>>>>>>>> * Current version of Airavata is also not performing job scheduling
>>>> across available computational resources and expecting gateways/users to
>>>> pick resources during experiment launch. Airavata will need to implement
>>>> schedulers which become aware of existing loads on the clusters and
>> spread
>>>> jobs efficiently. The scheduler should be able to get access to
>> heuristics
>>>> on previous executions and current requirements which includes job size
>>>> (number of nodes/cores), memory requirements, wall time estimates and so
>>>> forth.
>>>>>>>> 
>>>>>>>> * As Airavata is mapping multiple individual user jobs into one or
>>>> more community account submissions, it also becomes critical to
>> implement
>>>> fair-share scheduling among these users to ensure fair use of
>> allocations
>>>> as well as allowable queue limits.
>>>>>>>> 
>>>>>>>> Other use cases?
>>>>>>>> 
>>>>>>>> We will greatly appreciate if folks on this list can shed light on
>>>> experiences using schedulers implemented in hadoop, mesos, storm or
>> other
>>>> frameworks outside of their intended use. For instance, hadoop (yarn)
>>>> capacity [2] and fair schedulers [3][4][5] seem to meet the needs of
>>>> airavata. Is it a good idea to attempt to reuse these implementations?
>> Any
>>>> other pluggable third-party alternatives.
>>>>>>>> 
>>>>>>>> Thanks in advance for your time and insights,
>>>>>>>> 
>>>>>>>> Suresh
>>>>>>>> 
>>>>>>>> [1] -
>>>> 
>> https://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide#running
>>>>>>>> [2] -
>>>> 
>> http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>>>>>>>> [3] -
>>>> 
>> http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>>>>>> [4] - https://issues.apache.org/jira/browse/HADOOP-3746
>>>>>>>> [5] - https://issues.apache.org/jira/browse/YARN-326
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Scheduling stratergies for Airavata

Reply via email to