Hi Eran, Jijoe

Can you share the missing reference you indicate below? 

Ofcourse by all means its good for Airavata to build over projects like Mesos, 
thats my motivation for this discussion. I am not yet suggesting implementing a 
scheduler, that will be a distraction. The meta scheduler I illustrated is a 
mere routing to be injected into airavata job management with a simple FIFO. We 
looking forward to hearing options from you all on whats the right third party 
software is. Manu Singh a first year graduate student at IU volunteers to do a 
academic study of these solutions, so will appreciate pointers. 

Suresh

On Sep 3, 2014, at 11:59 AM, Eran Chinthaka Withana <[email protected]> 
wrote:

> Hi,
> 
> Before you go ahead and implement on your own, consider reading this mail
> thread[1] and looking at how frameworks like Apache Aurora does it on top
> of Apache Mesos. These may provide good inputs for this implementation.
> 
> (thanks to Jijoe also who provided input for this)
> 
> 
> 
> Thanks,
> Eran Chinthaka Withana
> 
> 
> On Wed, Sep 3, 2014 at 5:50 AM, Suresh Marru <[email protected]> wrote:
> 
>> Thank you all for comments and suggestions. I summarized the discussion as
>> a implementation plan on a wiki page:
>> 
>> https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Metascheduler
>> 
>> If this is amenable, we can take this to dev list to plan the development
>> in two phases. First implement the Throttle-Job in and short term and then
>> plan the Auto-Scheduling capabilities.
>> 
>> Suresh
>> 
>> On Sep 2, 2014, at 1:50 PM, Gary E. Gorbet <[email protected]> wrote:
>> 
>>> It seems to me that among many possible functions a metascheduler (MS)
>> would provide, there are two separate ones that must be addressed first.
>> The two use cases implied are as follows.
>>> 
>>> (1) The gateway submits a group of jobs to a specified resource where
>> the count of jobs exceeds the resource’s queued job limit. Let’s say 300
>> very quick jobs are submitted, where the limit is 50 per community user.
>> The MS must maintain an internal queue and release jobs to the resource in
>> groups with job counts under the limit (say, 40 at a time).
>>> 
>>> (2) The gateway submits a job or set of jobs with a flag that specifies
>> that Airavata choose the resource. Here, MCP or some other mechanism
>> arrives eventually at the specific resource that completes the job(s).
>>> 
>>> Where both uses are needed - unspecified resource and a group of jobs
>> with count exceeding limits - the MS action would be best defined by
>> knowing the definitions and mechanisms employed in the two separate
>> functions. For example, if MCP is employed, the initial brute force test
>> submissions might need to be done using the determined number of jobs at a
>> time (e.g., 40). But the design here must adhere to design criteria arrived
>> at for both function (1) and function (2).
>>> 
>>> In UltraScan’s case, the most immediate need is for (1). The user could
>> manually determine the best resource or just make a reasonable guess. What
>> the user does not want to do is manually release jobs 40 at a time. The
>> gateway interface allows specification of a group of 300 jobs and the user
>> does not care what is going on under the covers to effect the running of
>> all of them eventually. So, I guess I am lobbying for addressing (1) first;
>> both to meet UltraScan’s immediate need and to elucidate the design of more
>> sophisticated functionality.
>>> 
>>> - Gary
>>> 
>>> On Sep 2, 2014, at 12:02 PM, Suresh Marru <[email protected]> wrote:
>>> 
>>>> Hi Kenneth,
>>>> 
>>>> On Sep 2, 2014, at 12:44 PM, K Yoshimoto <[email protected]> wrote:
>>>> 
>>>>> 
>>>>> The tricky thing is the need to maintain an internal queue of
>>>>> jobs when the Stampede queued jobs limit is reached.  If airavata
>>>>> has an internal representation for jobs to be submitted, I think you
>>>>> are most of the way there.
>>>> 
>>>> Airavata has an internal representation of jobs, but there is no good
>> global view of all the jobs running on a given resource for a given
>> community account. We are trying to fix this, once this is done, as you
>> say, the FIFO implementation should be straight forward.
>>>> 
>>>>> It is tricky to do resource-matching scheduling when the job mix
>>>>> is not known.  For example, the scheduler does not know whether
>>>>> to preserve memory vs cores when deciding where to place a job.
>>>>> Also, the interactions of the gateway scheduler and the local
>>>>> schedulers may be complicated to predict.
>>>>> 
>>>>> Fair share is probably not a good idea.  In practice, it tends
>>>>> to disrupt the other scheduling policies such that one group is
>>>>> penalized and the others don't run much earlier.
>>>> 
>>>> Interesting. What do you think of the capacity based scheduling
>> algorithm (linked below)?
>>>> 
>>>>> 
>>>>> One option is to maintain the gateway job queue internally,
>>>>> then use the MCP brute force approach: submit to all resources,
>>>>> then cancel after the first job start.  You may also want to
>>>>> allow the gateway to set per-resource policy limits on
>>>>> number of jobs, job duration, job core size, SUs, etc.
>>>> 
>>>> MCP is something we should try. The limits per gateway per resource
>> exists, but we need to exercise these capabilities.
>>>> 
>>>> Suresh
>>>> 
>>>>> 
>>>>> On Tue, Sep 02, 2014 at 07:50:12AM -0400, Suresh Marru wrote:
>>>>>> Hi All,
>>>>>> 
>>>>>> Need some guidance on identifying a scheduling strategy and a
>> pluggable third party implementation for airavata scheduling needs. For
>> context let me describe the use cases for scheduling within airavata:
>>>>>> 
>>>>>> * If we gateway/user is submitting a series of jobs, airavata is
>> currently not throttling them and sending them to compute clusters (in a
>> FIFO way). Resources enforce per user job limit within a queue and ensure
>> fair use of the clusters ((example: stampede allows 50 jobs per user in the
>> normal queue [1]). Airavata will need to implement queues and throttle jobs
>> respecting the max-job-per-queue limits of a underlying resource queue.
>>>>>> 
>>>>>> * Current version of Airavata is also not performing job scheduling
>> across available computational resources and expecting gateways/users to
>> pick resources during experiment launch. Airavata will need to implement
>> schedulers which become aware of existing loads on the clusters and spread
>> jobs efficiently. The scheduler should be able to get access to heuristics
>> on previous executions and current requirements which includes job size
>> (number of nodes/cores), memory requirements, wall time estimates and so
>> forth.
>>>>>> 
>>>>>> * As Airavata is mapping multiple individual user jobs into one or
>> more community account submissions, it also becomes critical to implement
>> fair-share scheduling among these users to ensure fair use of allocations
>> as well as allowable queue limits.
>>>>>> 
>>>>>> Other use cases?
>>>>>> 
>>>>>> We will greatly appreciate if folks on this list can shed light on
>> experiences using schedulers implemented in hadoop, mesos, storm or other
>> frameworks outside of their intended use. For instance, hadoop (yarn)
>> capacity [2] and fair schedulers [3][4][5] seem to meet the needs of
>> airavata. Is it a good idea to attempt to reuse these implementations? Any
>> other pluggable third-party alternatives.
>>>>>> 
>>>>>> Thanks in advance for your time and insights,
>>>>>> 
>>>>>> Suresh
>>>>>> 
>>>>>> [1] -
>> https://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide#running
>>>>>> [2] -
>> http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>>>>>> [3] -
>> http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>>>> [4] - https://issues.apache.org/jira/browse/HADOOP-3746
>>>>>> [5] - https://issues.apache.org/jira/browse/YARN-326
>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
>> 

Reply via email to