Re: Scheduling stratergies for Airavata

Suresh Marru Wed, 03 Sep 2014 07:39:40 -0700

Thank you all for comments and suggestions. I summarized the discussion as a 
implementation plan on a wiki page:


https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Metascheduler

If this is amenable, we can take this to dev list to plan the development in 
two phases. First implement the Throttle-Job in and short term and then plan 
the Auto-Scheduling capabilities. 

Suresh

On Sep 2, 2014, at 1:50 PM, Gary E. Gorbet <[email protected]> wrote:

> It seems to me that among many possible functions a metascheduler (MS) would 
> provide, there are two separate ones that must be addressed first. The two 
> use cases implied are as follows.
> 
> (1) The gateway submits a group of jobs to a specified resource where the 
> count of jobs exceeds the resource’s queued job limit. Let’s say 300 very 
> quick jobs are submitted, where the limit is 50 per community user. The MS 
> must maintain an internal queue and release jobs to the resource in groups 
> with job counts under the limit (say, 40 at a time).
> 
> (2) The gateway submits a job or set of jobs with a flag that specifies that 
> Airavata choose the resource. Here, MCP or some other mechanism arrives 
> eventually at the specific resource that completes the job(s).
> 
> Where both uses are needed - unspecified resource and a group of jobs with 
> count exceeding limits - the MS action would be best defined by knowing the 
> definitions and mechanisms employed in the two separate functions. For 
> example, if MCP is employed, the initial brute force test submissions might 
> need to be done using the determined number of jobs at a time (e.g., 40). But 
> the design here must adhere to design criteria arrived at for both function 
> (1) and function (2).
> 
> In UltraScan’s case, the most immediate need is for (1). The user could 
> manually determine the best resource or just make a reasonable guess. What 
> the user does not want to do is manually release jobs 40 at a time. The 
> gateway interface allows specification of a group of 300 jobs and the user 
> does not care what is going on under the covers to effect the running of all 
> of them eventually. So, I guess I am lobbying for addressing (1) first; both 
> to meet UltraScan’s immediate need and to elucidate the design of more 
> sophisticated functionality.
> 
> - Gary
> 
> On Sep 2, 2014, at 12:02 PM, Suresh Marru <[email protected]> wrote:
> 
>> Hi Kenneth,
>> 
>> On Sep 2, 2014, at 12:44 PM, K Yoshimoto <[email protected]> wrote:
>> 
>>> 
>>> The tricky thing is the need to maintain an internal queue of
>>> jobs when the Stampede queued jobs limit is reached.  If airavata
>>> has an internal representation for jobs to be submitted, I think you
>>> are most of the way there.
>> 
>> Airavata has an internal representation of jobs, but there is no good global 
>> view of all the jobs running on a given resource for a given community 
>> account. We are trying to fix this, once this is done, as you say, the FIFO 
>> implementation should be straight forward. 
>> 
>>> It is tricky to do resource-matching scheduling when the job mix
>>> is not known.  For example, the scheduler does not know whether
>>> to preserve memory vs cores when deciding where to place a job.
>>> Also, the interactions of the gateway scheduler and the local
>>> schedulers may be complicated to predict.
>>> 
>>> Fair share is probably not a good idea.  In practice, it tends
>>> to disrupt the other scheduling policies such that one group is
>>> penalized and the others don't run much earlier.
>> 
>> Interesting. What do you think of the capacity based scheduling algorithm 
>> (linked below)?
>> 
>>> 
>>> One option is to maintain the gateway job queue internally,
>>> then use the MCP brute force approach: submit to all resources,
>>> then cancel after the first job start.  You may also want to
>>> allow the gateway to set per-resource policy limits on
>>> number of jobs, job duration, job core size, SUs, etc.
>> 
>> MCP is something we should try. The limits per gateway per resource exists, 
>> but we need to exercise these capabilities. 
>> 
>> Suresh
>> 
>>> 
>>> On Tue, Sep 02, 2014 at 07:50:12AM -0400, Suresh Marru wrote:
>>>> Hi All,
>>>> 
>>>> Need some guidance on identifying a scheduling strategy and a pluggable 
>>>> third party implementation for airavata scheduling needs. For context let 
>>>> me describe the use cases for scheduling within airavata:
>>>> 
>>>> * If we gateway/user is submitting a series of jobs, airavata is currently 
>>>> not throttling them and sending them to compute clusters (in a FIFO way). 
>>>> Resources enforce per user job limit within a queue and ensure fair use of 
>>>> the clusters ((example: stampede allows 50 jobs per user in the normal 
>>>> queue [1]). Airavata will need to implement queues and throttle jobs 
>>>> respecting the max-job-per-queue limits of a underlying resource queue. 
>>>> 
>>>> * Current version of Airavata is also not performing job scheduling across 
>>>> available computational resources and expecting gateways/users to pick 
>>>> resources during experiment launch. Airavata will need to implement 
>>>> schedulers which become aware of existing loads on the clusters and spread 
>>>> jobs efficiently. The scheduler should be able to get access to heuristics 
>>>> on previous executions and current requirements which includes job size 
>>>> (number of nodes/cores), memory requirements, wall time estimates and so 
>>>> forth. 
>>>> 
>>>> * As Airavata is mapping multiple individual user jobs into one or more 
>>>> community account submissions, it also becomes critical to implement 
>>>> fair-share scheduling among these users to ensure fair use of allocations 
>>>> as well as allowable queue limits.
>>>> 
>>>> Other use cases? 
>>>> 
>>>> We will greatly appreciate if folks on this list can shed light on 
>>>> experiences using schedulers implemented in hadoop, mesos, storm or other 
>>>> frameworks outside of their intended use. For instance, hadoop (yarn) 
>>>> capacity [2] and fair schedulers [3][4][5] seem to meet the needs of 
>>>> airavata. Is it a good idea to attempt to reuse these implementations? Any 
>>>> other pluggable third-party alternatives. 
>>>> 
>>>> Thanks in advance for your time and insights,
>>>> 
>>>> Suresh
>>>> 
>>>> [1] - 
>>>> https://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide#running
>>>> [2] - 
>>>> http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>>>> [3] - 
>>>> http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>> [4] - https://issues.apache.org/jira/browse/HADOOP-3746
>>>> [5] - https://issues.apache.org/jira/browse/YARN-326
>>>> 
>>>> 
>> 
>

Re: Scheduling stratergies for Airavata

Reply via email to