Hi Eran, Jijoe Can you share the missing reference you indicate below?
Ofcourse by all means its good for Airavata to build over projects like Mesos, thats my motivation for this discussion. I am not yet suggesting implementing a scheduler, that will be a distraction. The meta scheduler I illustrated is a mere routing to be injected into airavata job management with a simple FIFO. We looking forward to hearing options from you all on whats the right third party software is. Manu Singh a first year graduate student at IU volunteers to do a academic study of these solutions, so will appreciate pointers. Suresh On Sep 3, 2014, at 11:59 AM, Eran Chinthaka Withana <[email protected]> wrote: > Hi, > > Before you go ahead and implement on your own, consider reading this mail > thread[1] and looking at how frameworks like Apache Aurora does it on top > of Apache Mesos. These may provide good inputs for this implementation. > > (thanks to Jijoe also who provided input for this) > > > > Thanks, > Eran Chinthaka Withana > > > On Wed, Sep 3, 2014 at 5:50 AM, Suresh Marru <[email protected]> wrote: > >> Thank you all for comments and suggestions. I summarized the discussion as >> a implementation plan on a wiki page: >> >> https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Metascheduler >> >> If this is amenable, we can take this to dev list to plan the development >> in two phases. First implement the Throttle-Job in and short term and then >> plan the Auto-Scheduling capabilities. >> >> Suresh >> >> On Sep 2, 2014, at 1:50 PM, Gary E. Gorbet <[email protected]> wrote: >> >>> It seems to me that among many possible functions a metascheduler (MS) >> would provide, there are two separate ones that must be addressed first. >> The two use cases implied are as follows. >>> >>> (1) The gateway submits a group of jobs to a specified resource where >> the count of jobs exceeds the resource’s queued job limit. Let’s say 300 >> very quick jobs are submitted, where the limit is 50 per community user. >> The MS must maintain an internal queue and release jobs to the resource in >> groups with job counts under the limit (say, 40 at a time). >>> >>> (2) The gateway submits a job or set of jobs with a flag that specifies >> that Airavata choose the resource. Here, MCP or some other mechanism >> arrives eventually at the specific resource that completes the job(s). >>> >>> Where both uses are needed - unspecified resource and a group of jobs >> with count exceeding limits - the MS action would be best defined by >> knowing the definitions and mechanisms employed in the two separate >> functions. For example, if MCP is employed, the initial brute force test >> submissions might need to be done using the determined number of jobs at a >> time (e.g., 40). But the design here must adhere to design criteria arrived >> at for both function (1) and function (2). >>> >>> In UltraScan’s case, the most immediate need is for (1). The user could >> manually determine the best resource or just make a reasonable guess. What >> the user does not want to do is manually release jobs 40 at a time. The >> gateway interface allows specification of a group of 300 jobs and the user >> does not care what is going on under the covers to effect the running of >> all of them eventually. So, I guess I am lobbying for addressing (1) first; >> both to meet UltraScan’s immediate need and to elucidate the design of more >> sophisticated functionality. >>> >>> - Gary >>> >>> On Sep 2, 2014, at 12:02 PM, Suresh Marru <[email protected]> wrote: >>> >>>> Hi Kenneth, >>>> >>>> On Sep 2, 2014, at 12:44 PM, K Yoshimoto <[email protected]> wrote: >>>> >>>>> >>>>> The tricky thing is the need to maintain an internal queue of >>>>> jobs when the Stampede queued jobs limit is reached. If airavata >>>>> has an internal representation for jobs to be submitted, I think you >>>>> are most of the way there. >>>> >>>> Airavata has an internal representation of jobs, but there is no good >> global view of all the jobs running on a given resource for a given >> community account. We are trying to fix this, once this is done, as you >> say, the FIFO implementation should be straight forward. >>>> >>>>> It is tricky to do resource-matching scheduling when the job mix >>>>> is not known. For example, the scheduler does not know whether >>>>> to preserve memory vs cores when deciding where to place a job. >>>>> Also, the interactions of the gateway scheduler and the local >>>>> schedulers may be complicated to predict. >>>>> >>>>> Fair share is probably not a good idea. In practice, it tends >>>>> to disrupt the other scheduling policies such that one group is >>>>> penalized and the others don't run much earlier. >>>> >>>> Interesting. What do you think of the capacity based scheduling >> algorithm (linked below)? >>>> >>>>> >>>>> One option is to maintain the gateway job queue internally, >>>>> then use the MCP brute force approach: submit to all resources, >>>>> then cancel after the first job start. You may also want to >>>>> allow the gateway to set per-resource policy limits on >>>>> number of jobs, job duration, job core size, SUs, etc. >>>> >>>> MCP is something we should try. The limits per gateway per resource >> exists, but we need to exercise these capabilities. >>>> >>>> Suresh >>>> >>>>> >>>>> On Tue, Sep 02, 2014 at 07:50:12AM -0400, Suresh Marru wrote: >>>>>> Hi All, >>>>>> >>>>>> Need some guidance on identifying a scheduling strategy and a >> pluggable third party implementation for airavata scheduling needs. For >> context let me describe the use cases for scheduling within airavata: >>>>>> >>>>>> * If we gateway/user is submitting a series of jobs, airavata is >> currently not throttling them and sending them to compute clusters (in a >> FIFO way). Resources enforce per user job limit within a queue and ensure >> fair use of the clusters ((example: stampede allows 50 jobs per user in the >> normal queue [1]). Airavata will need to implement queues and throttle jobs >> respecting the max-job-per-queue limits of a underlying resource queue. >>>>>> >>>>>> * Current version of Airavata is also not performing job scheduling >> across available computational resources and expecting gateways/users to >> pick resources during experiment launch. Airavata will need to implement >> schedulers which become aware of existing loads on the clusters and spread >> jobs efficiently. The scheduler should be able to get access to heuristics >> on previous executions and current requirements which includes job size >> (number of nodes/cores), memory requirements, wall time estimates and so >> forth. >>>>>> >>>>>> * As Airavata is mapping multiple individual user jobs into one or >> more community account submissions, it also becomes critical to implement >> fair-share scheduling among these users to ensure fair use of allocations >> as well as allowable queue limits. >>>>>> >>>>>> Other use cases? >>>>>> >>>>>> We will greatly appreciate if folks on this list can shed light on >> experiences using schedulers implemented in hadoop, mesos, storm or other >> frameworks outside of their intended use. For instance, hadoop (yarn) >> capacity [2] and fair schedulers [3][4][5] seem to meet the needs of >> airavata. Is it a good idea to attempt to reuse these implementations? Any >> other pluggable third-party alternatives. >>>>>> >>>>>> Thanks in advance for your time and insights, >>>>>> >>>>>> Suresh >>>>>> >>>>>> [1] - >> https://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide#running >>>>>> [2] - >> http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html >>>>>> [3] - >> http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html >>>>>> [4] - https://issues.apache.org/jira/browse/HADOOP-3746 >>>>>> [5] - https://issues.apache.org/jira/browse/YARN-326 >>>>>> >>>>>> >>>> >>> >> >>
