We are motivated by a parameter sweep problem, but this is really a general problem for any gateway using a community credential.

Marlon

On 9/2/14, 7:50 AM, Suresh Marru wrote:
Hi All,

Need some guidance on identifying a scheduling strategy and a pluggable third 
party implementation for airavata scheduling needs. For context let me describe 
the use cases for scheduling within airavata:

* If we gateway/user is submitting a series of jobs, airavata is currently not 
throttling them and sending them to compute clusters (in a FIFO way). Resources 
enforce per user job limit within a queue and ensure fair use of the clusters 
((example: stampede allows 50 jobs per user in the normal queue [1]). Airavata 
will need to implement queues and throttle jobs respecting the 
max-job-per-queue limits of a underlying resource queue.
* Current version of Airavata is also not performing job scheduling across available computational resources and expecting gateways/users to pick resources during experiment launch. Airavata will need to implement schedulers which become aware of existing loads on the clusters and spread jobs efficiently. The scheduler should be able to get access to heuristics on previous executions and current requirements which includes job size (number of nodes/cores), memory requirements, wall time estimates and so forth.

* As Airavata is mapping multiple individual user jobs into one or more 
community account submissions, it also becomes critical to implement fair-share 
scheduling among these users to ensure fair use of allocations as well as 
allowable queue limits.

Other use cases?

We will greatly appreciate if folks on this list can shed light on experiences 
using schedulers implemented in hadoop, mesos, storm or other frameworks 
outside of their intended use. For instance, hadoop (yarn) capacity [2] and 
fair schedulers [3][4][5] seem to meet the needs of airavata. Is it a good idea 
to attempt to reuse these implementations? Any other pluggable third-party 
alternatives.

Thanks in advance for your time and insights,

Suresh

[1] - 
https://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide#running
[2] - 
http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
[3] - 
http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
[4] - https://issues.apache.org/jira/browse/HADOOP-3746
[5] - https://issues.apache.org/jira/browse/YARN-326




Reply via email to