Ah! I think I understand what you're saying now. Rather than trying to ensure we stay within the policy limits, we should just submit a job and check if it was accepted or not. If it was rejected, we can add it to a queue to be resubmitted at a later time or to a different resource. Is this correct?
On Mon, Aug 3, 2015 at 1:10 PM, K Yoshimoto <[email protected]> wrote: > > The point is that the policy limit could change at any time. > If it does, and there is a mismatch in the limit at the resource > and the limit in Airavata, bad things will happen. Schedulers > will vary in the format of their policy limit output, so it's > more reliable to monitor actual job submissions and handle failures. > Remember that it's possible for job limits to vary for a single > resource not only on queue name, but on job characteristics, > such as allocation account, core count, wall clock limit, etc. > > On Mon, Aug 03, 2015 at 12:53:22PM -0400, Raminderjeet Singh wrote: > > Usually these limits are set as a policy by the resource provider and do > > not change usually. As long as we have a place holder to configure/change > > it in Airavata for a user/gateway, we don't need to get it from a > resource. > > > > > > On Mon, Aug 3, 2015 at 12:33 PM, John Weachock <[email protected]> > wrote: > > > > > I think it would be best for us to not maintain our own record of the > job > > > limit - we need to remember that jobs will be submitted to these > resources > > > using the community accounts through other methods as well. I think I > > > remember someone mentioning that it would be ideal to poll the > resources > > > for their limits - can anyone confirm that we can do this? > > > > > > On Mon, Aug 3, 2015 at 12:24 PM, Douglas Chau <[email protected]> > > > wrote: > > > > > >> Hmm @shameera, that's very true. Perhaps, we can store the submission > > >> requests in registry. In the event that orchestrator goes down we can > > >> recover them through registry afterwards. > > >> > > >> @Yoshimito, I didn't think about that - will take it into > > >> consideration.Thanks for the insight! > > >> > > >> On Mon, Aug 3, 2015 at 12:11 PM, K Yoshimoto <[email protected]> > wrote: > > >> > > >>> > > >>> I think you also want to put in a check for successful submission, > > >>> then take appropriate action on failed submission. It can be > > >>> difficult to keep the submission limit up-to-date. > > >>> > > >>> On Mon, Aug 03, 2015 at 11:03:46AM -0400, Douglas Chau wrote: > > >>> > Hey Devs, > > >>> > > > >>> > Just wanted to get some input on our to plan to implement the queue > > >>> > throttling feature. > > >>> > > > >>> > Batch Queue Throttling: > > >>> > - in Orchestrator, the current submit() function in > > >>> GFACPassiveJobSubmitter > > >>> > publishes jobs to rabbitmq immediately > > >>> > - instead of publishing immediately we should pass the messages to > a > > >>> new > > >>> > component, call it BatchQueueClass. > > >>> > - we need a new component BatchQueueClass to periodically check to > see > > >>> when > > >>> > we can unload jobs to submit > > >>> > > > >>> > Adding BatchQueueClass > > >>> > - setup a new table(s) to contain compute resource names and their > > >>> > corresponding queues' current job numbers and maximum job limits > > >>> > - data models in airavata have information on maximum job > submission > > >>> limits > > >>> > for a queue but no data on how many jobs are currently running > > >>> > - the current job number will effectively act as a counter, which > will > > >>> be > > >>> > incremented when a job is submitted, and when a job is completed > > >>> > - once that is done, BatchQueueClass needs to periodically check > new > > >>> table > > >>> > to see if the user's requested queue's current job number < queue > job > > >>> > limit. If it is then we can pop jobs off and submit them until we > hit > > >>> the > > >>> > job limit; if not, then we wait until the we're under the job > limit. > > >>> > > > >>> > How does this sound? > > >>> > > > >>> > Doug > > >>> > > >> > > >> > > > >
