Thank you all for comments and suggestions. I summarized the discussion as a implementation plan on a wiki page:
https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Metascheduler If this is amenable, we can take this to dev list to plan the development in two phases. First implement the Throttle-Job in and short term and then plan the Auto-Scheduling capabilities. Suresh On Sep 2, 2014, at 1:50 PM, Gary E. Gorbet <[email protected]> wrote: > It seems to me that among many possible functions a metascheduler (MS) would > provide, there are two separate ones that must be addressed first. The two > use cases implied are as follows. > > (1) The gateway submits a group of jobs to a specified resource where the > count of jobs exceeds the resource’s queued job limit. Let’s say 300 very > quick jobs are submitted, where the limit is 50 per community user. The MS > must maintain an internal queue and release jobs to the resource in groups > with job counts under the limit (say, 40 at a time). > > (2) The gateway submits a job or set of jobs with a flag that specifies that > Airavata choose the resource. Here, MCP or some other mechanism arrives > eventually at the specific resource that completes the job(s). > > Where both uses are needed - unspecified resource and a group of jobs with > count exceeding limits - the MS action would be best defined by knowing the > definitions and mechanisms employed in the two separate functions. For > example, if MCP is employed, the initial brute force test submissions might > need to be done using the determined number of jobs at a time (e.g., 40). But > the design here must adhere to design criteria arrived at for both function > (1) and function (2). > > In UltraScan’s case, the most immediate need is for (1). The user could > manually determine the best resource or just make a reasonable guess. What > the user does not want to do is manually release jobs 40 at a time. The > gateway interface allows specification of a group of 300 jobs and the user > does not care what is going on under the covers to effect the running of all > of them eventually. So, I guess I am lobbying for addressing (1) first; both > to meet UltraScan’s immediate need and to elucidate the design of more > sophisticated functionality. > > - Gary > > On Sep 2, 2014, at 12:02 PM, Suresh Marru <[email protected]> wrote: > >> Hi Kenneth, >> >> On Sep 2, 2014, at 12:44 PM, K Yoshimoto <[email protected]> wrote: >> >>> >>> The tricky thing is the need to maintain an internal queue of >>> jobs when the Stampede queued jobs limit is reached. If airavata >>> has an internal representation for jobs to be submitted, I think you >>> are most of the way there. >> >> Airavata has an internal representation of jobs, but there is no good global >> view of all the jobs running on a given resource for a given community >> account. We are trying to fix this, once this is done, as you say, the FIFO >> implementation should be straight forward. >> >>> It is tricky to do resource-matching scheduling when the job mix >>> is not known. For example, the scheduler does not know whether >>> to preserve memory vs cores when deciding where to place a job. >>> Also, the interactions of the gateway scheduler and the local >>> schedulers may be complicated to predict. >>> >>> Fair share is probably not a good idea. In practice, it tends >>> to disrupt the other scheduling policies such that one group is >>> penalized and the others don't run much earlier. >> >> Interesting. What do you think of the capacity based scheduling algorithm >> (linked below)? >> >>> >>> One option is to maintain the gateway job queue internally, >>> then use the MCP brute force approach: submit to all resources, >>> then cancel after the first job start. You may also want to >>> allow the gateway to set per-resource policy limits on >>> number of jobs, job duration, job core size, SUs, etc. >> >> MCP is something we should try. The limits per gateway per resource exists, >> but we need to exercise these capabilities. >> >> Suresh >> >>> >>> On Tue, Sep 02, 2014 at 07:50:12AM -0400, Suresh Marru wrote: >>>> Hi All, >>>> >>>> Need some guidance on identifying a scheduling strategy and a pluggable >>>> third party implementation for airavata scheduling needs. For context let >>>> me describe the use cases for scheduling within airavata: >>>> >>>> * If we gateway/user is submitting a series of jobs, airavata is currently >>>> not throttling them and sending them to compute clusters (in a FIFO way). >>>> Resources enforce per user job limit within a queue and ensure fair use of >>>> the clusters ((example: stampede allows 50 jobs per user in the normal >>>> queue [1]). Airavata will need to implement queues and throttle jobs >>>> respecting the max-job-per-queue limits of a underlying resource queue. >>>> >>>> * Current version of Airavata is also not performing job scheduling across >>>> available computational resources and expecting gateways/users to pick >>>> resources during experiment launch. Airavata will need to implement >>>> schedulers which become aware of existing loads on the clusters and spread >>>> jobs efficiently. The scheduler should be able to get access to heuristics >>>> on previous executions and current requirements which includes job size >>>> (number of nodes/cores), memory requirements, wall time estimates and so >>>> forth. >>>> >>>> * As Airavata is mapping multiple individual user jobs into one or more >>>> community account submissions, it also becomes critical to implement >>>> fair-share scheduling among these users to ensure fair use of allocations >>>> as well as allowable queue limits. >>>> >>>> Other use cases? >>>> >>>> We will greatly appreciate if folks on this list can shed light on >>>> experiences using schedulers implemented in hadoop, mesos, storm or other >>>> frameworks outside of their intended use. For instance, hadoop (yarn) >>>> capacity [2] and fair schedulers [3][4][5] seem to meet the needs of >>>> airavata. Is it a good idea to attempt to reuse these implementations? Any >>>> other pluggable third-party alternatives. >>>> >>>> Thanks in advance for your time and insights, >>>> >>>> Suresh >>>> >>>> [1] - >>>> https://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide#running >>>> [2] - >>>> http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html >>>> [3] - >>>> http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html >>>> [4] - https://issues.apache.org/jira/browse/HADOOP-3746 >>>> [5] - https://issues.apache.org/jira/browse/YARN-326 >>>> >>>> >> >
