Re: JobManager/JobPoller issues

Scott Gray Mon, 25 Feb 2019 12:14:38 -0800

Hi Jacques,

I'm working on implementing the priority queue approach at the moment for a
client.  All things going well it will be in production in a couple of
weeks and I'll report back then with a patch.


Regards
Scott

On Tue, 26 Feb 2019 at 03:11, Jacques Le Roux <jacques.le.r...@les7arts.com>
wrote:

> Hi,
>
> I put this comment there with OFBIZ-10002 trying to document why we have 5
> as hardcoded value of /max-threads/ attribute in /thread-pool/ element
> (serviceengine.xml). At this moment Scott already mentioned[1]:
>
>     /Honestly I think the topic is generic enough that OFBiz doesn't need
> to provide any information at all. Thread pool sizing is not exclusive to
>     OFBiz and it would be strange for anyone to modify the numbers without
> first researching sources that provide far more detail than a few sentences
>     in our config files will ever cover./
>
> I agree with Scott and Jacopo that jobs are more likely IO rather than CPU
> bounded. So I agree that we should take that into account, change the
> current algorithm and remove this somehow misleading comment. Scott's
> suggestion in his 2nd email sounds good to me. So If I understood well we
> could
> use an unbounded but finally limited queue, like it was before.
>
>     Although with all of that said, after a quick second look it appears
> that
>     the current implementation doesn't try poll for more jobs than the
>     configured limit (minus already queued jobs) so we might be fine with
> an
>     unbounded queue implementation.  We'd just need to alter the call to
>     JobManager.poll(int limit) to not pass in
>     executor.getQueue().remainingCapacity() and instead pass in something
> like
>     (threadPool.getJobs() - executor.getQueue().size())
>
> I'm fine with that as it would continue to prevent hitting physical
> limitations and can be tweaked by users as it's now. Note that it seems
> though
> uneasy to tweak as we received already several "complaints" about it.
>
> Now one of the advantage of a PriorityBlockingQueue is priority. And to
> take advantage of that we can't rely on "/natural ordering"/ and need to
> implement Comparable (which does no seem easy). Nicolas provided some
> leads below and this should be discussed. The must would be to have that
> parametrised, of course.
>
> My 2 cts
> //
>
> [1] https://markmail.org/message/ixzluzd44rgloa2j
>
> Jacques
>
> Le 06/02/2019 à 14:24, Nicolas Malin a écrit :
> > Hello Scott,
> >
> > On a customer project we use massively the job manager with an average
> of one hundred thousand job per days.
> >
> > We have different cases like, huge long jobs, async persistent job, fast
> regular job. The mainly problem that we detect has been (as you notified)
> > the long jobs that stuck poller's thread and when we restart OFBiz (we
> are on continuous delivery) we hadn't windows this without crash some jobs.
> >
> > To solve try with Gil to analyze if we can load some weighting on job
> definition to help the job manager on what jobs on the pending queue it can
> > push on queued queue. We changed own vision to create two pools, one for
> system maintenance and huge long jobs managed by two ofbiz instances and an
> > other to manage user activity jobs also managed by two instances. We
> also added on service definition an information to indicate the
> predilection pool
> >
> > This isn't a big deal and not resolve the stuck pool but all blocked
> jobs aren't vital for daily activity.
> >
> > For crashed job, we introduced in trunk service lock that we set before
> an update and wait a windows for the restart.
> >
> > At this time for all OOM detected we reanalyse the origin job and tried
> to decompose it by persistent async service to help loading repartition.
> >
> > If I had more time, I would be oriented job improvement to :
> >
> >  * Define an execution plan rule to link services and poller without
> touch any service definition
> >
> >  * Define configuration by instance for the job vacuum to refine by
> service volumetric
> >
> > This feedback is a little confused Scott, maybe you found interesting
> things
> >
> > Nicolas
> >
> > On 30/01/2019 20:47, Scott Gray wrote:
> >> Hi folks,
> >>
> >> Just jotting down some issues with the JobManager over noticed over the
> >> last few days:
> >> 1. min-threads in serviceengine.xml is never exceeded unless the job
> count
> >> in the queue exceeds 5000 (or whatever is configured).  Is this not
> obvious
> >> to anyone else?  I don't think this was the behavior prior to a
> refactoring
> >> a few years ago.
> >> 2. The advice on the number of threads to use doesn't seem good to me,
> it
> >> assumes your jobs are CPU bound when in my experience they are more
> likely
> >> to be I/O bound while making db or external API calls, sending emails
> etc.
> >> With the default setup, it only takes two long running jobs to
> effectively
> >> block the processing of any others until the queue hits 5000 and the
> other
> >> threads are finally opened up.  If you're not quickly maxing out the
> queue
> >> then any other jobs are stuck until the slow jobs finally complete.
> >> 3. Purging old jobs doesn't seem to be well implemented to me, from what
> >> I've seen the system is only capable of clearing a few hundred per
> minute
> >> and if you've filled the queue with them then regular jobs have to queue
> >> behind them and can take many minutes to finally be executed.
> >>
> >> I'm wondering if anyone has experimented with reducing the queue the
> size?
> >> I'm considering reducing it to say 100 jobs per thread (along with
> >> increasing the thread count).  In theory it would reduce the time real
> jobs
> >> have to sit behind PurgeJobs and would also open up additional threads
> for
> >> use earlier.
> >>
> >> Alternatively I've pondered trying a PriorityBlockingQueue for the job
> >> queue (unfortunately the implementation is unbounded though so it isn't
> a
> >> drop-in replacement) so that PurgeJobs always sit at the back of the
> >> queue.  It might also allow prioritizing certain "user facing" jobs
> (such
> >> as asynchronous data imports) over lower priority less time critical
> jobs.
> >> Maybe another option (or in conjunction) is some sort of "swim-lane"
> >> queue/executor that allocates jobs to threads based on prior execution
> >> speed so that slow running jobs can never use up all threads and block
> >> faster jobs.
> >>
> >> Any thoughts/experiences you have to share would be appreciated.
> >>
> >> Thanks
> >> Scott
> >>
> >
>

Re: JobManager/JobPoller issues

Reply via email to