On Tue, Sep 23, 2014 at 4:04 PM, Shameera Rathnayaka <[email protected] > wrote:
> Hi Lahiru, > > I could able to resolve this by moving the job throttle logic to > launchExperiment method and synchronizing jobSubmitter.submit and job count > update tasks. This is introduced small performance bottle neck, if we can > tolerate that bottle neck in job submission phase then this will work > without an issue where we have one Orchestrator in our deployment. WDYT? > can we go with this and later change it to a better way ? > OK Regards Lahiru > > Thanks, > Shameera. > > On Tue, Sep 23, 2014 at 2:06 PM, Shameera Rathnayaka < > [email protected]> wrote: > >> Hi Lahiru, >> >> On Tue, Sep 23, 2014 at 1:38 PM, Lahiru Gunathilake <[email protected]> >> wrote: >> >>> Its wrong to update the count before doing a successful job >>> submission(Because finally job submission might fail and it is not the >>> actual count in the queue), and even if we do it in the same place there >>> will always be a race-condition. >>> >> >> Can't we say if jobSubmitter.submit(..) method return "true" the job has >> been submitted to the compute resource without any issue ? if we can then >> increase the job count after the submit operation would solve our issue for >> some extend(yes i can see it is hard to completely fix the race condition). >> >> >>> If we want to really fix this we have implement a queue based approach >>> where GFAC will pick jobs from worker queue and if the count is exceeded we >>> delay the job submission. >>> >> >> Are you suggesting to move scheduling part to GFac instead of doing it in >> Orchestrator? and is this a global queue where every GFac node can access >> or queue per a GFac node? >> >> >> >>> >>> >>> >>> On Tue, Sep 23, 2014 at 1:04 PM, Shameera Rathnayaka < >>> [email protected]> wrote: >>> >>>> Hi Devs, >>>> >>>> I am working on queue based job throttling implementations and here is >>>> the relatedJIRA[1] ticket which is created to track down the implementation >>>> steps. >>>> >>>> Following explain how job throttling has been implemented for now. This >>>> is only apply for computer resources has batch queues define with it, >>>> otherwise not. >>>> >>>> There is a validator call JobCountValidator, this validator check >>>> whether there is enough space to submit a new job or not and return "true" >>>> and "false" accordingly. I am using zookeeper to track the runtime data >>>> like how many jobs have been submitted to a given host. With the current >>>> implementation job count is increased when the job added to the monitoring >>>> queue and decreased when the job removed from monitoring queue. I ran few >>>> test and this approach is working fine. But after i ran a load test in high >>>> rate i observed that this approach is not working as we are doing >>>> validation in orchestrator and the job count update in gfac. This is due to >>>> a race condition, Orchestrator can still pass the validation step even we >>>> have submitted allowed max job count to a resource but not yet updated the >>>> job count in zookeeper. Therefore we need to do job submission and job >>>> count increase in the same place to fix that. >>>> >>>> So potential place is SimpleOrchestratorImpl#launchExperiment method. >>>> WDYT? >>>> >>>> As validation and launch operations are called using two client calls >>>> still we have that race condition. i have sent a separate mail for that. >>>> >>>> Thanks, >>>> Shameera. >>>> >>>> -- >>>> Best Regards, >>>> Shameera Rathnayaka. >>>> >>>> email: shameera AT apache.org , shameerainfo AT gmail.com >>>> Blog : http://shameerarathnayaka.blogspot.com/ >>>> >>> >>> >>> >>> -- >>> Research Assistant >>> Science Gateways Group >>> Indiana University >>> >> >> >> >> -- >> Best Regards, >> Shameera Rathnayaka. >> >> email: shameera AT apache.org , shameerainfo AT gmail.com >> Blog : http://shameerarathnayaka.blogspot.com/ >> > > > > -- > Best Regards, > Shameera Rathnayaka. > > email: shameera AT apache.org , shameerainfo AT gmail.com > Blog : http://shameerarathnayaka.blogspot.com/ > -- Research Assistant Science Gateways Group Indiana University
