You could try requesting an increased job limit for the community user.
SDSC sets different queued job limits for gateway vs individual users.
I think TACC would probably be receptive to that.

On Tue, Sep 02, 2014 at 09:11:00AM -0500, Borries Demeler wrote:
> Our application involves submission of several hundred quite small (a couple 
> of minutes for most
> clusters, ~128 cores, give or take) computational jobs, running the same code 
> on multiple datasets.
> 
> We are hitting the limit of 50 jobs on TACC resources, with all others 
> failing. The problem is 
> made worse because all users submit under a community account, which treats 
> every submission to
> be part of the same allocation account.
> 
> I see a few possibilities:
> 
> 1. a separate FIFO queue, making sure none of the resources get overloaded by 
> any community account user
> 
> 2. submitting all jobs as a single job somehow to where the job is submitted 
> for the aggregate walltime
> for all jobs. A special workscript would spawn jobs underneath the parent 
> submission. Not sure if this
> is feasable or reasonable.
> 
> 3. spreading the jobs around all possible resources
> 
> 4. a combination of 1 and 3.
> 
> -Borries
> 
> 
> 
> 
> On Tue, Sep 02, 2014 at 07:50:12AM -0400, Suresh Marru wrote:
> > Hi All,
> > 
> > Need some guidance on identifying a scheduling strategy and a pluggable 
> > third party implementation for airavata scheduling needs. For context let 
> > me describe the use cases for scheduling within airavata:
> > 
> > * If we gateway/user is submitting a series of jobs, airavata is currently 
> > not throttling them and sending them to compute clusters (in a FIFO way). 
> > Resources enforce per user job limit within a queue and ensure fair use of 
> > the clusters ((example: stampede allows 50 jobs per user in the normal 
> > queue [1]). Airavata will need to implement queues and throttle jobs 
> > respecting the max-job-per-queue limits of a underlying resource queue. 
> >  
> > * Current version of Airavata is also not performing job scheduling across 
> > available computational resources and expecting gateways/users to pick 
> > resources during experiment launch. Airavata will need to implement 
> > schedulers which become aware of existing loads on the clusters and spread 
> > jobs efficiently. The scheduler should be able to get access to heuristics 
> > on previous executions and current requirements which includes job size 
> > (number of nodes/cores), memory requirements, wall time estimates and so 
> > forth. 
> > 
> > * As Airavata is mapping multiple individual user jobs into one or more 
> > community account submissions, it also becomes critical to implement 
> > fair-share scheduling among these users to ensure fair use of allocations 
> > as well as allowable queue limits.
> > 
> > Other use cases? 
> > 
> > We will greatly appreciate if folks on this list can shed light on 
> > experiences using schedulers implemented in hadoop, mesos, storm or other 
> > frameworks outside of their intended use. For instance, hadoop (yarn) 
> > capacity [2] and fair schedulers [3][4][5] seem to meet the needs of 
> > airavata. Is it a good idea to attempt to reuse these implementations? Any 
> > other pluggable third-party alternatives. 
> > 
> > Thanks in advance for your time and insights,
> > 
> > Suresh
> > 
> > [1] - 
> > https://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide#running
> > [2] - 
> > http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
> > [3] - 
> > http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
> > [4] - https://issues.apache.org/jira/browse/HADOOP-3746
> > [5] - https://issues.apache.org/jira/browse/YARN-326
> > 
> > 

Reply via email to