I agree with Shameera that handling submission limit is more than job
validation. We don't want to fail the request if user (community users in
most of the gateway cases) already exceeded the limit of a resource. We
still want to run the job after some of the running jobs finish. Doug
already provided an example to implement BatchQueueThrottling class.
According to me, Orchestrator is a right place to manage job throttling and
run the jobs based on the limit provided per resource per user.  I am ok
with the idea of stating with a BatchQueueThrottling class and extend it to
use current GFACPassiveJobSubmitter with queue management feature.

You need to think about synchronizing different job states to determine the
status and make sure to obtain the right number of running
jobs/resource/users. Job can change state for different reasons like GFAC
failure or user canceled the job etc. We already have a table to manage
states of the running job and I hope you can use the same status data. Also
we are using RabbitMQ to maintain the job state and incase of GFAC failures
jobs can recover. Please consider all the possible cases to develop a
solution. You need to decide on the default behavior also, incase there is
no limit or gateway admin did not configure a limit.

Thanks
Raminder


On Mon, Aug 3, 2015 at 12:11 PM, K Yoshimoto <[email protected]> wrote:

>
> I think you also want to put in a check for successful submission,
> then take appropriate action on failed submission.  It can be
> difficult to keep the submission limit up-to-date.
>
> On Mon, Aug 03, 2015 at 11:03:46AM -0400, Douglas Chau wrote:
> > Hey Devs,
> >
> > Just wanted to get some input on our to plan to implement the queue
> > throttling feature.
> >
> > Batch Queue Throttling:
> > - in Orchestrator, the current submit() function in
> GFACPassiveJobSubmitter
> > publishes jobs to rabbitmq immediately
> > - instead of publishing immediately we should pass the messages to a new
> > component, call it BatchQueueClass.
> > - we need a new component BatchQueueClass to periodically check to see
> when
> > we can unload jobs to submit
> >
> > Adding BatchQueueClass
> > - setup a new table(s) to contain compute resource names and their
> > corresponding queues' current job numbers and maximum job limits
> > - data models in airavata have information on maximum job submission
> limits
> > for a queue but no data on how many jobs are currently running
> > - the current job number will effectively act as a counter, which will be
> > incremented when a job is submitted, and when a job is completed
> > - once that is done, BatchQueueClass needs to periodically check new
> table
> > to see if the user's requested queue's current job number < queue job
> > limit. If it is then we can pop jobs off and submit them until we hit the
> > job limit; if not, then we wait until the we're under the job limit.
> >
> > How does this sound?
> >
> > Doug
>

Reply via email to