Re: JobManager failing to schedule jobs

Brett Palmer Thu, 14 Jul 2011 05:35:59 -0700

One feature that would help to prevent this problem in the future is a
configuration parameter in the service engine that would set the maximum
number of jobs the poller would process at a time.  Right now the poller
reads the JobSandbox and gets every job that has a status of Pending.  Then
it tries to change the status for each of these to running (or something
like that).  If the number of pending jobs is too large the poller will time
out before it can change the state of all the pending jobs.  Changing the
transaction timeout can help this problem but having another configuration
like "max-poll-jobs" could limit the number of pending jobs that are
processed in one transaction.  There is a configuration called "jobs" but I
don't think that is used by the polling process.


I've tried to use the service engine as an asynchronous batch server but run
into problems when the number of pending jobs gets around 10,000.


Brett

On Wed, Jul 13, 2011 at 10:34 PM, BJ Freeman <bjf...@free-man.net> wrote:

> you going to run into this from time to time or one reason or another.
> the approach I took was to spread the jobs out so they are not lumped
> together.
> take a look at how the jobs are Marshalled to be run.
>
> Josh Jacobson sent the following on 7/13/2011 8:35 PM:
> > Vacuum has been run, (took quite a while). Yeah, I see now that the
> > JobManager actually tries to update all the JobSandbox rows in the
> > transaction, so 60 seconds was pretty low.
> >
> > I am trying 10 minutes now and see how that goes.
> >
> > I am using postgress by the way.
> >
> > Thanks for the help, I really appreciate it.
> >
> > --
> > Josh.
> >
> > On Wed, Jul 13, 2011 at 8:29 PM, Scott Gray <scott.g...@hotwaxmedia.com>
> wrote:
> >> Not sure what db you're using but it probably wouldn't hurt to run a
> vacuum on the table to speed up processing.
> >>
> >> By the way, I'm pretty sure the default timeout is 60 seconds so you
> might want to try something a little larger :-)
> >>
> >> Regards
> >> Scott
> >>
> >> On 14/07/2011, at 2:58 PM, Josh Jacobson wrote:
> >>
> >>> I tried 60 seconds for timeout but that didn't work. I guess Ill
> >>> double it now and keep trying.
> >>>
> >>> I have about 260,000 pending jobs, and nothing is getting done.
> >>>
> >>> I know what you mean about purgeOldjobs. That service is crashed now
> >>> and I deleted old jobs from the database by hand. I was up to 2.6
> >>> million rows. Ofbiz was pretty much unusable.
> >>>
> >>> If you have any other suggestions I'd love Yo hear them.
> >>>
> >>> On Wednesday, July 13, 2011, Scott Gray <scott.g...@hotwaxmedia.com>
> wrote:
> >>>> Ah okay, that is entirely dependent on the number of jobs and the
> speed the server can process them.  As a side note I would keep a close eye
> on the purgeOldJobs service, when it starts falling over (transaction
> timeout again) then the number of rows in the table will increase quickly
> which in turn will slow down polling.
> >>>>
> >>>> In general the whole persisted jobs implementation is a bit fragile,
> especially when dealing with a large number of jobs.  I've wanted to replace
> it with something like quartz for a while but haven't had the time.
> >>>>
> >>>> Regards
> >>>> Scott
> >>>>
> >>>> On 14/07/2011, at 2:10 PM, Josh Jacobson wrote:
> >>>>
> >>>>> Thanks again. I actually meant a suggestion for the transaction
> >>>>> timeout. In any case I am grateful for your explanation.
> >>>>>
> >>>>>
> >>>>> On Wednesday, July 13, 2011, Scott Gray <scott.g...@hotwaxmedia.com>
> wrote:
> >>>>>> As best I can tell there shouldn't be any need to increase the
> interval between polls since the interval timer doesn't actually start until
> the previous poll has completed (see JobPoller.run()) so I can't see how a
> small interval would cause any backlog problems.
> >>>>>>
> >>>>>> I'm guessing if there is any lock contention then it's probably
> caused by the executing jobs trying to update their respective rows while
> the poller is holding a table lock.  So from that point of view I guess
> increasing the interval could reduce the amount of contention between the
> executing jobs and the next poll.
> >>>>>>
> >>>>>> Regards
> >>>>>> Scott
> >>>>>>
> >>>>>> On 14/07/2011, at 1:02 PM, Josh Jacobson wrote:
> >>>>>>
> >>>>>>> Scott,
> >>>>>>>
> >>>>>>> Thanks! That is very precise advise. Do you have a suggestion on
> >>>>>>> interval time? 60 seconds? 120?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray <
> scott.g...@hotwaxmedia.com> wrote:
> >>>>>>>> That configuration is for the frequency of job polls.  There isn't
> any ability to specify the transaction timeout via configuration so you'll
> need to modify the code directly:
> >>>>>>>> JobManager.java (line 148):
> >>>>>>>> beganTransaction = TransactionUtil.begin();
> >>>>>>>> needs to be changed to use TransactionUtil.begin(int)
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>> Scott
> >>>>>>>>
> >>>>>>>> HotWax Media
> >>>>>>>> http://www.hotwaxmedia.com
> >>>>>>>>
> >>>>>>>> On 14/07/2011, at 12:23 PM, Josh Jacobson wrote:
> >>>>>>>>
> >>>>>>>>> Brett,
> >>>>>>>>>
> >>>>>>>>> Before I start trying to run the jobs manually, I want to give
> your
> >>>>>>>>> suggestion a try. I think I know where to configure the job
> polling
> >>>>>>>>> transaction time (I believe it's the poll-db-millis="20000" value
> on
> >>>>>>>>> the framework/service/config/serviceengine.xml.
> >>>>>>>>>
> >>>>>>>>> However, I still don't know what to increase it to. I understand
> that
> >>>>>>>>> we wouldn't want to make it bigger than the default polling
> interval.
> >>>>>>>>> Do you know what the default interval between polling is?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer <
> brettgpal...@gmail.com> wrote:
> >>>>>>>>>> I meant removing finished jobs.  If you have thousands of
> pending jobs then
> >>>>>>>>>> you will have the same problem I mentioned in my first email.
>  One
> >>>>>>>>>> resolution will be to increase the job poller transaction time.
>  In the
> >>>>>>>>>> ofbiz version I was using there was not a way to configure the
> poller
> >>>>>>>>>> transaction time.  It just used the default time.  I had to
> create a patch
> >>>>>>>>>> to allow this to happen.
> >>>>>>>>>>
> >>>>>>>>>> In the patch you had to be careful to not increase the
> transaction time
> >>>>>>>>>> greater than the frequency of the job poller.  Otherwise you get
> into a lock
> >>>>>>>>>> situation where one job poller is still running within a
> transaction and
> >>>>>>>>>> another poller starts.  This didn't create a huge problem but
> the second job
> >>>>>>>>>> poller would usually lock and then time out.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Brett
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson <
> josh.s.jacob...@gmail.com>wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Brett,
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>
> >>
> >
>

Re: JobManager failing to schedule jobs

Reply via email to