Yes, that's the idea. In general, something dynamic and adaptable will probably be more robust than a rigid limit.
On Mon, Aug 03, 2015 at 01:15:50PM -0400, John Weachock wrote: > Ah! I think I understand what you're saying now. Rather than trying to > ensure we stay within the policy limits, we should just submit a job and > check if it was accepted or not. If it was rejected, we can add it to a > queue to be resubmitted at a later time or to a different resource. Is this > correct? > > On Mon, Aug 3, 2015 at 1:10 PM, K Yoshimoto <[email protected]> wrote: > > > > > The point is that the policy limit could change at any time. > > If it does, and there is a mismatch in the limit at the resource > > and the limit in Airavata, bad things will happen. Schedulers > > will vary in the format of their policy limit output, so it's > > more reliable to monitor actual job submissions and handle failures. > > Remember that it's possible for job limits to vary for a single > > resource not only on queue name, but on job characteristics, > > such as allocation account, core count, wall clock limit, etc. > > > > On Mon, Aug 03, 2015 at 12:53:22PM -0400, Raminderjeet Singh wrote: > > > Usually these limits are set as a policy by the resource provider and do > > > not change usually. As long as we have a place holder to configure/change > > > it in Airavata for a user/gateway, we don't need to get it from a > > resource. > > > > > > > > > On Mon, Aug 3, 2015 at 12:33 PM, John Weachock <[email protected]> > > wrote: > > > > > > > I think it would be best for us to not maintain our own record of the > > job > > > > limit - we need to remember that jobs will be submitted to these > > resources > > > > using the community accounts through other methods as well. I think I > > > > remember someone mentioning that it would be ideal to poll the > > resources > > > > for their limits - can anyone confirm that we can do this? > > > > > > > > On Mon, Aug 3, 2015 at 12:24 PM, Douglas Chau <[email protected]> > > > > wrote: > > > > > > > >> Hmm @shameera, that's very true. Perhaps, we can store the submission > > > >> requests in registry. In the event that orchestrator goes down we can > > > >> recover them through registry afterwards. > > > >> > > > >> @Yoshimito, I didn't think about that - will take it into > > > >> consideration.Thanks for the insight! > > > >> > > > >> On Mon, Aug 3, 2015 at 12:11 PM, K Yoshimoto <[email protected]> > > wrote: > > > >> > > > >>> > > > >>> I think you also want to put in a check for successful submission, > > > >>> then take appropriate action on failed submission. It can be > > > >>> difficult to keep the submission limit up-to-date. > > > >>> > > > >>> On Mon, Aug 03, 2015 at 11:03:46AM -0400, Douglas Chau wrote: > > > >>> > Hey Devs, > > > >>> > > > > >>> > Just wanted to get some input on our to plan to implement the queue > > > >>> > throttling feature. > > > >>> > > > > >>> > Batch Queue Throttling: > > > >>> > - in Orchestrator, the current submit() function in > > > >>> GFACPassiveJobSubmitter > > > >>> > publishes jobs to rabbitmq immediately > > > >>> > - instead of publishing immediately we should pass the messages to > > a > > > >>> new > > > >>> > component, call it BatchQueueClass. > > > >>> > - we need a new component BatchQueueClass to periodically check to > > see > > > >>> when > > > >>> > we can unload jobs to submit > > > >>> > > > > >>> > Adding BatchQueueClass > > > >>> > - setup a new table(s) to contain compute resource names and their > > > >>> > corresponding queues' current job numbers and maximum job limits > > > >>> > - data models in airavata have information on maximum job > > submission > > > >>> limits > > > >>> > for a queue but no data on how many jobs are currently running > > > >>> > - the current job number will effectively act as a counter, which > > will > > > >>> be > > > >>> > incremented when a job is submitted, and when a job is completed > > > >>> > - once that is done, BatchQueueClass needs to periodically check > > new > > > >>> table > > > >>> > to see if the user's requested queue's current job number < queue > > job > > > >>> > limit. If it is then we can pop jobs off and submit them until we > > hit > > > >>> the > > > >>> > job limit; if not, then we wait until the we're under the job > > limit. > > > >>> > > > > >>> > How does this sound? > > > >>> > > > > >>> > Doug > > > >>> > > > >> > > > >> > > > > > >
