subject:"Re\: JobManager failing to schedule jobs"

the key is  Transaction timeout
this could be the job length
could be the database connection

please specify the version of ofbiz since earlier transaction problems
were taken care of by changing code that deals with transactions.

Josh Jacobson sent the following on 7/13/2011 11:48 AM:
 Hello Everyone,
 
 I have an ofbiz instance in production where none of the jobs are
 being performed. I have about 160K jobs in pending status, but they
 are never being schedule.
 I can see the following in the log:
 
 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [
 JobManager.java:201:ERROR]  exception report
 -- Transaction
 error trying to commit when polling and updating the JobSandbox:
 org.ofbiz.entity.transaction.GenericTransactionException: Roll back
 error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout) Exception:
 org.ofbiz.entity.transaction.GenericTransactionException Message: Roll
 back error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout)  cause
 -
 Exception: javax.transaction.RollbackException Message: Transaction
 timeout  stack trace
 ---
 javax.transaction.RollbackException: Transaction timeout
 org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269)
 org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245)
 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259)
 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245)
 org.ofbiz.service.job.JobManager.poll(JobManager.java:197)
 org.ofbiz.service.job.JobPoller.run(JobPoller.java:90)
 java.lang.Thread.run(Thread.java:619)
 
 
 I believe that the JobManager is not being able to handle all those
 jobs to schedule them, so nothing is being scheduled, which of course
 make the job list longer.
 
 Can anyone think of how to make the jobs run?
 
 All help much appreciated,

Re: JobManager failing to schedule jobs

2011-07-13 Thread Brett Palmer

Josh,

I've also seen this problem if the JobSandbox table has too many rows to
process.  I ran into a similar problem when I tried to run 10,000 Async
batch processes.  The time it took for the JobPoller to process all the
records was too long and the transaction would time out.

I had a patch to change the transaction timeout for the JobPoller
specifically as it wasn't available in ofbiz at the time, but I don't think
I ever submitted it.  I could look for this patch if anyone is interested
but it may already be implemented in the framework.

I would try archiving jobs from the JobSandbox first.


Brett

On Wed, Jul 13, 2011 at 12:48 PM, Josh Jacobson
josh.s.jacob...@gmail.comwrote:

 Hello Everyone,

 I have an ofbiz instance in production where none of the jobs are
 being performed. I have about 160K jobs in pending status, but they
 are never being schedule.
 I can see the following in the log:

 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [
 JobManager.java:201:ERROR]  exception report
 -- Transaction
 error trying to commit when polling and updating the JobSandbox:
 org.ofbiz.entity.transaction.GenericTransactionException: Roll back
 error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout) Exception:
 org.ofbiz.entity.transaction.GenericTransactionException Message: Roll
 back error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout)  cause
 -
 Exception: javax.transaction.RollbackException Message: Transaction
 timeout  stack trace
 ---
 javax.transaction.RollbackException: Transaction timeout

 org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269)

 org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245)

 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259)

 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245)
 org.ofbiz.service.job.JobManager.poll(JobManager.java:197)
 org.ofbiz.service.job.JobPoller.run(JobPoller.java:90)
 java.lang.Thread.run(Thread.java:619)

 

 I believe that the JobManager is not being able to handle all those
 jobs to schedule them, so nothing is being scheduled, which of course
 make the job list longer.

 Can anyone think of how to make the jobs run?

 All help much appreciated,

 --
 Josh.

Re: JobManager failing to schedule jobs

BJ,

I am running 10.04.

On Wed, Jul 13, 2011 at 12:00 PM, BJ Freeman bjf...@free-man.net wrote:
 the key is  Transaction timeout
 this could be the job length
 could be the database connection

 please specify the version of ofbiz since earlier transaction problems
 were taken care of by changing code that deals with transactions.

 Josh Jacobson sent the following on 7/13/2011 11:48 AM:
 Hello Everyone,

 I have an ofbiz instance in production where none of the jobs are
 being performed. I have about 160K jobs in pending status, but they
 are never being schedule.
 I can see the following in the log:

 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [
 JobManager.java:201:ERROR]  exception report
 -- Transaction
 error trying to commit when polling and updating the JobSandbox:
 org.ofbiz.entity.transaction.GenericTransactionException: Roll back
 error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout) Exception:
 org.ofbiz.entity.transaction.GenericTransactionException Message: Roll
 back error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout)  cause
 -
 Exception: javax.transaction.RollbackException Message: Transaction
 timeout  stack trace
 ---
 javax.transaction.RollbackException: Transaction timeout
 org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269)
 org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245)
 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259)
 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245)
 org.ofbiz.service.job.JobManager.poll(JobManager.java:197)
 org.ofbiz.service.job.JobPoller.run(JobPoller.java:90)
 java.lang.Thread.run(Thread.java:619)
 

 I believe that the JobManager is not being able to handle all those
 jobs to schedule them, so nothing is being scheduled, which of course
 make the job list longer.

 Can anyone think of how to make the jobs run?

 All help much appreciated,

Re: JobManager failing to schedule jobs

Brett,

Can you please explain what you mean by archiving the current JobSandbox first?
Do you mean somehow removing the current pending jobs, applying you
patch and the copying them back again?

Thanks,


On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com wrote:
 Josh,

 I've also seen this problem if the JobSandbox table has too many rows to
 process.  I ran into a similar problem when I tried to run 10,000 Async
 batch processes.  The time it took for the JobPoller to process all the
 records was too long and the transaction would time out.

 I had a patch to change the transaction timeout for the JobPoller
 specifically as it wasn't available in ofbiz at the time, but I don't think
 I ever submitted it.  I could look for this patch if anyone is interested
 but it may already be implemented in the framework.

 I would try archiving jobs from the JobSandbox first.


 Brett

 On Wed, Jul 13, 2011 at 12:48 PM, Josh Jacobson
 josh.s.jacob...@gmail.comwrote:

 Hello Everyone,

 I have an ofbiz instance in production where none of the jobs are
 being performed. I have about 160K jobs in pending status, but they
 are never being schedule.
 I can see the following in the log:

 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [
 JobManager.java:201:ERROR]  exception report
 -- Transaction
 error trying to commit when polling and updating the JobSandbox:
 org.ofbiz.entity.transaction.GenericTransactionException: Roll back
 error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout) Exception:
 org.ofbiz.entity.transaction.GenericTransactionException Message: Roll
 back error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout)  cause
 -
 Exception: javax.transaction.RollbackException Message: Transaction
 timeout  stack trace
 ---
 javax.transaction.RollbackException: Transaction timeout

 org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269)

 org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245)

 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259)

 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245)
 org.ofbiz.service.job.JobManager.poll(JobManager.java:197)
 org.ofbiz.service.job.JobPoller.run(JobPoller.java:90)
 java.lang.Thread.run(Thread.java:619)

 

 I believe that the JobManager is not being able to handle all those
 jobs to schedule them, so nothing is being scheduled, which of course
 make the job list longer.

 Can anyone think of how to make the jobs run?

 All help much appreciated,

 --
 Josh.

Re: JobManager failing to schedule jobs

Ok so you have the latest code.
what is the eviorment you working with.
OS
Memory
CPU speed

Josh Jacobson sent the following on 7/13/2011 12:12 PM:
 BJ,
 
 I am running 10.04.
 
 On Wed, Jul 13, 2011 at 12:00 PM, BJ Freeman bjf...@free-man.net wrote:
 the key is  Transaction timeout
 this could be the job length
 could be the database connection

 please specify the version of ofbiz since earlier transaction problems
 were taken care of by changing code that deals with transactions.

 Josh Jacobson sent the following on 7/13/2011 11:48 AM:
 Hello Everyone,

 I have an ofbiz instance in production where none of the jobs are
 being performed. I have about 160K jobs in pending status, but they
 are never being schedule.
 I can see the following in the log:

 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [
 JobManager.java:201:ERROR]  exception report
 -- Transaction
 error trying to commit when polling and updating the JobSandbox:
 org.ofbiz.entity.transaction.GenericTransactionException: Roll back
 error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout) Exception:
 org.ofbiz.entity.transaction.GenericTransactionException Message: Roll
 back error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout)  cause
 -
 Exception: javax.transaction.RollbackException Message: Transaction
 timeout  stack trace
 ---
 javax.transaction.RollbackException: Transaction timeout
 org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269)
 org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245)
 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259)
 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245)
 org.ofbiz.service.job.JobManager.poll(JobManager.java:197)
 org.ofbiz.service.job.JobPoller.run(JobPoller.java:90)
 java.lang.Thread.run(Thread.java:619)
 

 I believe that the JobManager is not being able to handle all those
 jobs to schedule them, so nothing is being scheduled, which of course
 make the job list longer.

 Can anyone think of how to make the jobs run?

 All help much appreciated,

Re: JobManager failing to schedule jobs

2011-07-13 Thread Brett Palmer

I meant removing finished jobs. If you have thousands of pending jobs then
you will have the same problem I mentioned in my first email. One
resolution will be to increase the job poller transaction time. In the
ofbiz version I was using there was not a way to configure the poller
transaction time. It just used the default time. I had to create a patch
to allow this to happen.

In the patch you had to be careful to not increase the transaction time
greater than the frequency of the job poller. Otherwise you get into a lock
situation where one job poller is still running within a transaction and
another poller starts. This didn't create a huge problem but the second job
poller would usually lock and then time out.

Brett

On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote:

Brett,

Can you please explain what you mean by archiving the current JobSandbox
first?
Do you mean somehow removing the current pending jobs, applying you
patch and the copying them back again?

Thanks,

On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com
wrote:
Josh,

I've also seen this problem if the JobSandbox table has too many rows to
process. I ran into a similar problem when I tried to run 10,000 Async
batch processes. The time it took for the JobPoller to process all the
records was too long and the transaction would time out.

I had a patch to change the transaction timeout for the JobPoller
specifically as it wasn't available in ofbiz at the time, but I don't
think
I ever submitted it. I could look for this patch if anyone is interested
but it may already be implemented in the framework.

I would try archiving jobs from the JobSandbox first.

Brett

On Wed, Jul 13, 2011 at 12:48 PM, Josh Jacobson
josh.s.jacob...@gmail.comwrote:

Hello Everyone,

I have an ofbiz instance in production where none of the jobs are
being performed. I have about 160K jobs in pending status, but they
are never being schedule.
I can see the following in the log:

org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269)

org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245)

org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259)

org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245)
org.ofbiz.service.job.JobManager.poll(JobManager.java:197)
org.ofbiz.service.job.JobPoller.run(JobPoller.java:90)
java.lang.Thread.run(Thread.java:619)

I believe that the JobManager is not being able to handle all those
jobs to schedule them, so nothing is being scheduled, which of course
make the job list longer.

Can anyone think of how to make the jobs run?

All help much appreciated,

--
Josh.

Re: JobManager failing to schedule jobs

Currently I am running:

Red Hat Enterprise Linux Server release 5.5
6 CPUs, 16384MB RAM

It was very recently upgraded from 2 CPUs and 8GB of RAM because we
were having performance issues (lots of swap memory being used). It's
on one of those cloud servers. Now it's running without using any
swap.

On Wed, Jul 13, 2011 at 12:22 PM, BJ Freeman bjf...@free-man.net wrote:
 Ok so you have the latest code.
 what is the eviorment you working with.
 OS
 Memory
 CPU speed

 Josh Jacobson sent the following on 7/13/2011 12:12 PM:
 BJ,

 I am running 10.04.

 On Wed, Jul 13, 2011 at 12:00 PM, BJ Freeman bjf...@free-man.net wrote:
 the key is  Transaction timeout
 this could be the job length
 could be the database connection

 please specify the version of ofbiz since earlier transaction problems
 were taken care of by changing code that deals with transactions.

 Josh Jacobson sent the following on 7/13/2011 11:48 AM:
 Hello Everyone,

 I have an ofbiz instance in production where none of the jobs are
 being performed. I have about 160K jobs in pending status, but they
 are never being schedule.
 I can see the following in the log:

 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [
 JobManager.java:201:ERROR]  exception report
 -- Transaction
 error trying to commit when polling and updating the JobSandbox:
 org.ofbiz.entity.transaction.GenericTransactionException: Roll back
 error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout) Exception:
 org.ofbiz.entity.transaction.GenericTransactionException Message: Roll
 back error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout)  cause
 -
 Exception: javax.transaction.RollbackException Message: Transaction
 timeout  stack trace
 ---
 javax.transaction.RollbackException: Transaction timeout
 org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269)
 org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245)
 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259)
 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245)
 org.ofbiz.service.job.JobManager.poll(JobManager.java:197)
 org.ofbiz.service.job.JobPoller.run(JobPoller.java:90)
 java.lang.Thread.run(Thread.java:619)
 

 I believe that the JobManager is not being able to handle all those
 jobs to schedule them, so nothing is being scheduled, which of course
 make the job list longer.

 Can anyone think of how to make the jobs run?

 All help much appreciated,

Re: JobManager failing to schedule jobs

On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com wrote:
 I meant removing finished jobs.  If you have thousands of pending jobs then
 you will have the same problem I mentioned in my first email.  One
 resolution will be to increase the job poller transaction time.  In the
 ofbiz version I was using there was not a way to configure the poller
 transaction time.  It just used the default time.  I had to create a patch
 to allow this to happen.

I see. I already did that: We had 2.6 million lines on the JobSandbox,
mostly of completed or failed jobs. We deleted completed and failed
and are now looking at about 260L pending jobs. I want to run those
jobs, so I can get the machine back to normal.


 In the patch you had to be careful to not increase the transaction time
 greater than the frequency of the job poller.  Otherwise you get into a lock
 situation where one job poller is still running within a transaction and
 another poller starts.  This didn't create a huge problem but the second job
 poller would usually lock and then time out.

I understand the possible race condition. So how do I figure what to
set the timeout to and where do I configure that?

Thanks,

--
Josh.

Re: JobManager failing to schedule jobs

On Wed, Jul 13, 2011 at 12:51 PM, Josh Jacobson
josh.s.jacob...@gmail.com wrote:
 On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com wrote:
 I meant removing finished jobs.  If you have thousands of pending jobs then
 you will have the same problem I mentioned in my first email.  One
 resolution will be to increase the job poller transaction time.  In the
 ofbiz version I was using there was not a way to configure the poller
 transaction time.  It just used the default time.  I had to create a patch
 to allow this to happen.

 I see. I already did that: We had 2.6 million lines on the JobSandbox,
 mostly of completed or failed jobs. We deleted completed and failed
 and are now looking at about 260L pending jobs. I want to run those
 jobs, so I can get the machine back to normal.

Sorry, just noticed the typo: We currently have 260K + jobs as pending
and I want to process them to get things back to normal.

Thanks for the help,

--
Josh.

Re: JobManager failing to schedule jobs

You now know why I don't recommend cloud configuration for realtime
operations, unless your running over dedicate lines not part of the
internet.
to summarize you environment caused the problem not ofbiz
Now you have jobs cued that should have been run but have piled up.
you need a way to get the job run so they don;t time out the system.
I recommend you look at the purge old jobs service, copy and modify it
to run your jobs, maybe by time group.

Josh Jacobson sent the following on 7/13/2011 12:48 PM:
 Currently I am running:
 
 Red Hat Enterprise Linux Server release 5.5
 6 CPUs, 16384MB RAM
 
 It was very recently upgraded from 2 CPUs and 8GB of RAM because we
 were having performance issues (lots of swap memory being used). It's
 on one of those cloud servers. Now it's running without using any
 swap.
 
 On Wed, Jul 13, 2011 at 12:22 PM, BJ Freeman bjf...@free-man.net wrote:
 Ok so you have the latest code.
 what is the eviorment you working with.
 OS
 Memory
 CPU speed

 Josh Jacobson sent the following on 7/13/2011 12:12 PM:
 BJ,

 I am running 10.04.

 On Wed, Jul 13, 2011 at 12:00 PM, BJ Freeman bjf...@free-man.net wrote:
 the key is  Transaction timeout
 this could be the job length
 could be the database connection

 please specify the version of ofbiz since earlier transaction problems
 were taken care of by changing code that deals with transactions.

 Josh Jacobson sent the following on 7/13/2011 11:48 AM:
 Hello Everyone,

 I have an ofbiz instance in production where none of the jobs are
 being performed. I have about 160K jobs in pending status, but they
 are never being schedule.
 I can see the following in the log:

 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [
 JobManager.java:201:ERROR]  exception report
 -- Transaction
 error trying to commit when polling and updating the JobSandbox:
 org.ofbiz.entity.transaction.GenericTransactionException: Roll back
 error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout) Exception:
 org.ofbiz.entity.transaction.GenericTransactionException Message: Roll
 back error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout)  cause
 -
 Exception: javax.transaction.RollbackException Message: Transaction
 timeout  stack trace
 ---
 javax.transaction.RollbackException: Transaction timeout
 org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269)
 org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245)
 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259)
 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245)
 org.ofbiz.service.job.JobManager.poll(JobManager.java:197)
 org.ofbiz.service.job.JobPoller.run(JobPoller.java:90)
 java.lang.Thread.run(Thread.java:619)
 

 I believe that the JobManager is not being able to handle all those
 jobs to schedule them, so nothing is being scheduled, which of course
 make the job list longer.

 Can anyone think of how to make the jobs run?

 All help much appreciated,

Re: JobManager failing to schedule jobs

Thanks for the pointers. I'll take a look.

There is one more piece of information: The purgeOldJobs service is in
a crashed status. Do you think that is significant?

Thanks,

On Wed, Jul 13, 2011 at 4:32 PM, BJ Freeman bjf...@free-man.net wrote:
 You now know why I don't recommend cloud configuration for realtime
 operations, unless your running over dedicate lines not part of the
 internet.
 to summarize you environment caused the problem not ofbiz
 Now you have jobs cued that should have been run but have piled up.
 you need a way to get the job run so they don;t time out the system.
 I recommend you look at the purge old jobs service, copy and modify it
 to run your jobs, maybe by time group.

 Josh Jacobson sent the following on 7/13/2011 12:48 PM:
 Currently I am running:

 Red Hat Enterprise Linux Server release 5.5
 6 CPUs, 16384MB RAM

 It was very recently upgraded from 2 CPUs and 8GB of RAM because we
 were having performance issues (lots of swap memory being used). It's
 on one of those cloud servers. Now it's running without using any
 swap.

 On Wed, Jul 13, 2011 at 12:22 PM, BJ Freeman bjf...@free-man.net wrote:
 Ok so you have the latest code.
 what is the eviorment you working with.
 OS
 Memory
 CPU speed

 Josh Jacobson sent the following on 7/13/2011 12:12 PM:
 BJ,

 I am running 10.04.

 On Wed, Jul 13, 2011 at 12:00 PM, BJ Freeman bjf...@free-man.net wrote:
 the key is  Transaction timeout
 this could be the job length
 could be the database connection

 please specify the version of ofbiz since earlier transaction problems
 were taken care of by changing code that deals with transactions.

 Josh Jacobson sent the following on 7/13/2011 11:48 AM:
 Hello Everyone,

 I have an ofbiz instance in production where none of the jobs are
 being performed. I have about 160K jobs in pending status, but they
 are never being schedule.
 I can see the following in the log:

 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [
 JobManager.java:201:ERROR]  exception report
 -- Transaction
 error trying to commit when polling and updating the JobSandbox:
 org.ofbiz.entity.transaction.GenericTransactionException: Roll back
 error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout) Exception:
 org.ofbiz.entity.transaction.GenericTransactionException Message: Roll
 back error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout)  cause
 -
 Exception: javax.transaction.RollbackException Message: Transaction
 timeout  stack trace
 ---
 javax.transaction.RollbackException: Transaction timeout
 org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269)
 org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245)
 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259)
 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245)
 org.ofbiz.service.job.JobManager.poll(JobManager.java:197)
 org.ofbiz.service.job.JobPoller.run(JobPoller.java:90)
 java.lang.Thread.run(Thread.java:619)
 

 I believe that the JobManager is not being able to handle all those
 jobs to schedule them, so nothing is being scheduled, which of course
 make the job list longer.

 Can anyone think of how to make the jobs run?

 All help much appreciated,

Re: JobManager failing to schedule jobs

it means it will not purge job done so you will get a build up
you can do a run service to start it again


Josh Jacobson sent the following on 7/13/2011 4:41 PM:
 Thanks for the pointers. I'll take a look.
 
 There is one more piece of information: The purgeOldJobs service is in
 a crashed status. Do you think that is significant?
 
 Thanks,
 
 On Wed, Jul 13, 2011 at 4:32 PM, BJ Freeman bjf...@free-man.net wrote:
 You now know why I don't recommend cloud configuration for realtime
 operations, unless your running over dedicate lines not part of the
 internet.
 to summarize you environment caused the problem not ofbiz
 Now you have jobs cued that should have been run but have piled up.
 you need a way to get the job run so they don;t time out the system.
 I recommend you look at the purge old jobs service, copy and modify it
 to run your jobs, maybe by time group.

 Josh Jacobson sent the following on 7/13/2011 12:48 PM:
 Currently I am running:

 Red Hat Enterprise Linux Server release 5.5
 6 CPUs, 16384MB RAM

 It was very recently upgraded from 2 CPUs and 8GB of RAM because we
 were having performance issues (lots of swap memory being used). It's
 on one of those cloud servers. Now it's running without using any
 swap.

 On Wed, Jul 13, 2011 at 12:22 PM, BJ Freeman bjf...@free-man.net wrote:
 Ok so you have the latest code.
 what is the eviorment you working with.
 OS
 Memory
 CPU speed

 Josh Jacobson sent the following on 7/13/2011 12:12 PM:
 BJ,

 I am running 10.04.

 On Wed, Jul 13, 2011 at 12:00 PM, BJ Freeman bjf...@free-man.net wrote:
 the key is  Transaction timeout
 this could be the job length
 could be the database connection

 please specify the version of ofbiz since earlier transaction problems
 were taken care of by changing code that deals with transactions.

 Josh Jacobson sent the following on 7/13/2011 11:48 AM:
 Hello Everyone,

 I have an ofbiz instance in production where none of the jobs are
 being performed. I have about 160K jobs in pending status, but they
 are never being schedule.
 I can see the following in the log:

 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [
 JobManager.java:201:ERROR]  exception report
 -- Transaction
 error trying to commit when polling and updating the JobSandbox:
 org.ofbiz.entity.transaction.GenericTransactionException: Roll back
 error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout) Exception:
 org.ofbiz.entity.transaction.GenericTransactionException Message: Roll
 back error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout)  cause
 -
 Exception: javax.transaction.RollbackException Message: Transaction
 timeout  stack trace
 ---
 javax.transaction.RollbackException: Transaction timeout
 org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269)
 org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245)
 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259)
 org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245)
 org.ofbiz.service.job.JobManager.poll(JobManager.java:197)
 org.ofbiz.service.job.JobPoller.run(JobPoller.java:90)
 java.lang.Thread.run(Thread.java:619)
 

 I believe that the JobManager is not being able to handle all those
 jobs to schedule them, so nothing is being scheduled, which of course
 make the job list longer.

 Can anyone think of how to make the jobs run?

 All help much appreciated,

Re: JobManager failing to schedule jobs

Thanks, that is what I figured. First things first though: I need to
get those jobs running somehow.

Thanks for the help.

On Wed, Jul 13, 2011 at 4:46 PM, BJ Freeman bjf...@free-man.net wrote:
it means it will not purge job done so you will get a build up
you can do a run service to start it again

Josh Jacobson sent the following on 7/13/2011 4:41 PM:
Thanks for the pointers. I'll take a look.

There is one more piece of information: The purgeOldJobs service is in
a crashed status. Do you think that is significant?

Thanks,

On Wed, Jul 13, 2011 at 4:32 PM, BJ Freeman bjf...@free-man.net wrote:
You now know why I don't recommend cloud configuration for realtime
operations, unless your running over dedicate lines not part of the
internet.
to summarize you environment caused the problem not ofbiz
Now you have jobs cued that should have been run but have piled up.
you need a way to get the job run so they don;t time out the system.
I recommend you look at the purge old jobs service, copy and modify it
to run your jobs, maybe by time group.

Josh Jacobson sent the following on 7/13/2011 12:48 PM:
Currently I am running:

Red Hat Enterprise Linux Server release 5.5
6 CPUs, 16384MB RAM

It was very recently upgraded from 2 CPUs and 8GB of RAM because we
were having performance issues (lots of swap memory being used). It's
on one of those cloud servers. Now it's running without using any
swap.

On Wed, Jul 13, 2011 at 12:22 PM, BJ Freeman bjf...@free-man.net wrote:
Ok so you have the latest code.
what is the eviorment you working with.
OS
Memory
CPU speed

Josh Jacobson sent the following on 7/13/2011 12:12 PM:
BJ,

I am running 10.04.

On Wed, Jul 13, 2011 at 12:00 PM, BJ Freeman bjf...@free-man.net wrote:
the key is Transaction timeout
this could be the job length
could be the database connection

please specify the version of ofbiz since earlier transaction problems
were taken care of by changing code that deals with transactions.

Josh Jacobson sent the following on 7/13/2011 11:48 AM:
Hello Everyone,

I have an ofbiz instance in production where none of the jobs are
being performed. I have about 160K jobs in pending status, but they
are never being schedule.
I can see the following in the log:

2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [
JobManager.java:201:ERROR] exception report
-- Transaction
error trying to commit when polling and updating the JobSandbox:
org.ofbiz.entity.transaction.GenericTransactionException: Roll back
error (with no rollbackOnly cause found), could not commit
transaction, was rolled back instead:
javax.transaction.RollbackException: Transaction timeout (Transaction
timeout) Exception:
org.ofbiz.entity.transaction.GenericTransactionException Message: Roll
back error (with no rollbackOnly cause found), could not commit
transaction, was rolled back instead:
javax.transaction.RollbackException: Transaction timeout (Transaction
timeout) cause
-
Exception: javax.transaction.RollbackException Message: Transaction
timeout stack trace
---
javax.transaction.RollbackException: Transaction timeout
org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269)
org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245)
org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259)
org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245)
org.ofbiz.service.job.JobManager.poll(JobManager.java:197)
org.ofbiz.service.job.JobPoller.run(JobPoller.java:90)
java.lang.Thread.run(Thread.java:619)

I believe that the JobManager is not being able to handle all those
jobs to schedule them, so nothing is being scheduled, which of course
make the job list longer.

Can anyone think of how to make the jobs run?

All help much appreciated,

Re: JobManager failing to schedule jobs

Brett,

Before I start trying to run the jobs manually, I want to give your
suggestion a try. I think I know where to configure the job polling
transaction time (I believe it's the poll-db-millis=2 value on
the framework/service/config/serviceengine.xml.

However, I still don't know what to increase it to. I understand that
we wouldn't want to make it bigger than the default polling interval.
Do you know what the default interval between polling is?

Thanks,

On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com wrote:
I meant removing finished jobs. If you have thousands of pending jobs then
you will have the same problem I mentioned in my first email. One
resolution will be to increase the job poller transaction time. In the
ofbiz version I was using there was not a way to configure the poller
transaction time. It just used the default time. I had to create a patch
to allow this to happen.

Brett

On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson
josh.s.jacob...@gmail.comwrote:

Brett,

Can you please explain what you mean by archiving the current JobSandbox
first?
Do you mean somehow removing the current pending jobs, applying you
patch and the copying them back again?

Thanks,

On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com
wrote:
Josh,

I would try archiving jobs from the JobSandbox first.

Brett

On Wed, Jul 13, 2011 at 12:48 PM, Josh Jacobson
josh.s.jacob...@gmail.comwrote:

Hello Everyone,

I have an ofbiz instance in production where none of the jobs are
being performed. I have about 160K jobs in pending status, but they
are never being schedule.
I can see the following in the log:

org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269)

org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245)

org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259)

I believe that the JobManager is not being able to handle all those
jobs to schedule them, so nothing is being scheduled, which of course
make the job list longer.

Can anyone think of how to make the jobs run?

All help much appreciated,

--
Josh.

Re: JobManager failing to schedule jobs

That configuration is for the frequency of job polls. There isn't any ability
to specify the transaction timeout via configuration so you'll need to modify
the code directly:
JobManager.java (line 148):
beganTransaction = TransactionUtil.begin();
needs to be changed to use TransactionUtil.begin(int)

Regards
Scott

HotWax Media
http://www.hotwaxmedia.com

On 14/07/2011, at 12:23 PM, Josh Jacobson wrote:

Brett,

However, I still don't know what to increase it to. I understand that
we wouldn't want to make it bigger than the default polling interval.
Do you know what the default interval between polling is?

Thanks,

Brett

On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson
josh.s.jacob...@gmail.comwrote:

Brett,

Can you please explain what you mean by archiving the current JobSandbox
first?
Do you mean somehow removing the current pending jobs, applying you
patch and the copying them back again?

Thanks,

On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com
wrote:
Josh,

I would try archiving jobs from the JobSandbox first.

Brett

On Wed, Jul 13, 2011 at 12:48 PM, Josh Jacobson
josh.s.jacob...@gmail.comwrote:

Hello Everyone,

I have an ofbiz instance in production where none of the jobs are
being performed. I have about 160K jobs in pending status, but they
are never being schedule.
I can see the following in the log:

org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269)

org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245)

org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259)

I believe that the JobManager is not being able to handle all those
jobs to schedule them, so nothing is being scheduled, which of course
make the job list longer.

Can anyone think of how to make the jobs run?

All help much appreciated,

--
Josh.

Re: JobManager failing to schedule jobs

Scott,

Thanks! That is very precise advise. Do you have a suggestion on
interval time? 60 seconds? 120?

Thanks,

On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com wrote:
That configuration is for the frequency of job polls. There isn't any
ability to specify the transaction timeout via configuration so you'll need
to modify the code directly:
JobManager.java (line 148):
beganTransaction = TransactionUtil.begin();
needs to be changed to use TransactionUtil.begin(int)

Regards
Scott

HotWax Media
http://www.hotwaxmedia.com

On 14/07/2011, at 12:23 PM, Josh Jacobson wrote:

Brett,

However, I still don't know what to increase it to. I understand that
we wouldn't want to make it bigger than the default polling interval.
Do you know what the default interval between polling is?

Thanks,

On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com
wrote:
I meant removing finished jobs. If you have thousands of pending jobs then
you will have the same problem I mentioned in my first email. One
resolution will be to increase the job poller transaction time. In the
ofbiz version I was using there was not a way to configure the poller
transaction time. It just used the default time. I had to create a patch
to allow this to happen.

Brett

On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson
josh.s.jacob...@gmail.comwrote:

Brett,

Can you please explain what you mean by archiving the current JobSandbox
first?
Do you mean somehow removing the current pending jobs, applying you
patch and the copying them back again?

Thanks,

On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com
wrote:
Josh,

I would try archiving jobs from the JobSandbox first.

Brett

On Wed, Jul 13, 2011 at 12:48 PM, Josh Jacobson
josh.s.jacob...@gmail.comwrote:

Hello Everyone,

I have an ofbiz instance in production where none of the jobs are
being performed. I have about 160K jobs in pending status, but they
are never being schedule.
I can see the following in the log:

org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269)

org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245)

org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259)

I believe that the JobManager is not being able to handle all those
jobs to schedule them, so

Re: JobManager failing to schedule jobs

As best I can tell there shouldn't be any need to increase the interval between 
polls since the interval timer doesn't actually start until the previous poll 
has completed (see JobPoller.run()) so I can't see how a small interval would 
cause any backlog problems.

I'm guessing if there is any lock contention then it's probably caused by the 
executing jobs trying to update their respective rows while the poller is 
holding a table lock.  So from that point of view I guess increasing the 
interval could reduce the amount of contention between the executing jobs and 
the next poll.

Regards
Scott

On 14/07/2011, at 1:02 PM, Josh Jacobson wrote:

 Scott,
 
 Thanks! That is very precise advise. Do you have a suggestion on
 interval time? 60 seconds? 120?
 
 Thanks,
 
 On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com 
 wrote:
 That configuration is for the frequency of job polls.  There isn't any 
 ability to specify the transaction timeout via configuration so you'll need 
 to modify the code directly:
 JobManager.java (line 148):
 beganTransaction = TransactionUtil.begin();
 needs to be changed to use TransactionUtil.begin(int)
 
 Regards
 Scott
 
 HotWax Media
 http://www.hotwaxmedia.com
 
 On 14/07/2011, at 12:23 PM, Josh Jacobson wrote:
 
 Brett,
 
 Before I start trying to run the jobs manually, I want to give your
 suggestion a try. I think I know where to configure the job polling
 transaction time (I believe it's the poll-db-millis=2 value on
 the framework/service/config/serviceengine.xml.
 
 However, I still don't know what to increase it to. I understand that
 we wouldn't want to make it bigger than the default polling interval.
 Do you know what the default interval between polling is?
 
 Thanks,
 
 On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com 
 wrote:
 I meant removing finished jobs.  If you have thousands of pending jobs then
 you will have the same problem I mentioned in my first email.  One
 resolution will be to increase the job poller transaction time.  In the
 ofbiz version I was using there was not a way to configure the poller
 transaction time.  It just used the default time.  I had to create a patch
 to allow this to happen.
 
 In the patch you had to be careful to not increase the transaction time
 greater than the frequency of the job poller.  Otherwise you get into a 
 lock
 situation where one job poller is still running within a transaction and
 another poller starts.  This didn't create a huge problem but the second 
 job
 poller would usually lock and then time out.
 
 
 
 Brett
 
 
 
 On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson 
 josh.s.jacob...@gmail.comwrote:
 
 Brett,
 
 Can you please explain what you mean by archiving the current JobSandbox
 first?
 Do you mean somehow removing the current pending jobs, applying you
 patch and the copying them back again?
 
 Thanks,
 
 
 On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com
 wrote:
 Josh,
 
 I've also seen this problem if the JobSandbox table has too many rows to
 process.  I ran into a similar problem when I tried to run 10,000 Async
 batch processes.  The time it took for the JobPoller to process all the
 records was too long and the transaction would time out.
 
 I had a patch to change the transaction timeout for the JobPoller
 specifically as it wasn't available in ofbiz at the time, but I don't
 think
 I ever submitted it.  I could look for this patch if anyone is interested
 but it may already be implemented in the framework.
 
 I would try archiving jobs from the JobSandbox first.
 
 
 Brett
 
 On Wed, Jul 13, 2011 at 12:48 PM, Josh Jacobson
 josh.s.jacob...@gmail.comwrote:
 
 Hello Everyone,
 
 I have an ofbiz instance in production where none of the jobs are
 being performed. I have about 160K jobs in pending status, but they
 are never being schedule.
 I can see the following in the log:
 
 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [
 JobManager.java:201:ERROR]  exception report
 -- Transaction
 error trying to commit when polling and updating the JobSandbox:
 org.ofbiz.entity.transaction.GenericTransactionException: Roll back
 error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout) Exception:
 org.ofbiz.entity.transaction.GenericTransactionException Message: Roll
 back error (with no rollbackOnly cause found), could not commit
 transaction, was rolled back instead:
 javax.transaction.RollbackException: Transaction timeout (Transaction
 timeout)  cause
 -
 Exception: javax.transaction.RollbackException Message: Transaction
 timeout  stack trace
 ---
 javax.transaction.RollbackException: Transaction

Re: JobManager failing to schedule jobs

Thanks again. I actually meant a suggestion for the transaction
timeout. In any case I am grateful for your explanation.


On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote:
 As best I can tell there shouldn't be any need to increase the interval 
 between polls since the interval timer doesn't actually start until the 
 previous poll has completed (see JobPoller.run()) so I can't see how a small 
 interval would cause any backlog problems.

 I'm guessing if there is any lock contention then it's probably caused by the 
 executing jobs trying to update their respective rows while the poller is 
 holding a table lock.  So from that point of view I guess increasing the 
 interval could reduce the amount of contention between the executing jobs and 
 the next poll.

 Regards
 Scott

 On 14/07/2011, at 1:02 PM, Josh Jacobson wrote:

 Scott,

 Thanks! That is very precise advise. Do you have a suggestion on
 interval time? 60 seconds? 120?

 Thanks,

 On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com 
 wrote:
 That configuration is for the frequency of job polls.  There isn't any 
 ability to specify the transaction timeout via configuration so you'll need 
 to modify the code directly:
 JobManager.java (line 148):
 beganTransaction = TransactionUtil.begin();
 needs to be changed to use TransactionUtil.begin(int)

 Regards
 Scott

 HotWax Media
 http://www.hotwaxmedia.com

 On 14/07/2011, at 12:23 PM, Josh Jacobson wrote:

 Brett,

 Before I start trying to run the jobs manually, I want to give your
 suggestion a try. I think I know where to configure the job polling
 transaction time (I believe it's the poll-db-millis=2 value on
 the framework/service/config/serviceengine.xml.

 However, I still don't know what to increase it to. I understand that
 we wouldn't want to make it bigger than the default polling interval.
 Do you know what the default interval between polling is?

 Thanks,

 On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com 
 wrote:
 I meant removing finished jobs.  If you have thousands of pending jobs 
 then
 you will have the same problem I mentioned in my first email.  One
 resolution will be to increase the job poller transaction time.  In the
 ofbiz version I was using there was not a way to configure the poller
 transaction time.  It just used the default time.  I had to create a patch
 to allow this to happen.

 In the patch you had to be careful to not increase the transaction time
 greater than the frequency of the job poller.  Otherwise you get into a 
 lock
 situation where one job poller is still running within a transaction and
 another poller starts.  This didn't create a huge problem but the second 
 job
 poller would usually lock and then time out.



 Brett



 On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson 
 josh.s.jacob...@gmail.comwrote:

 Brett,

 Can you please explain what you mean by archiving the current JobSandbox
 first?
 Do you mean somehow removing the current pending jobs, applying you
 patch and the copying them back again?

 Thanks,


 On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com
 wrote:
 Josh,

 I've also seen this problem if the JobSandbox table has too many rows to
 process.  I ran into a similar problem when I tried to run 10,000 Async
 batch processes.  The time it took for the JobPoller to process all the
 records was too long and the transaction would time out.

 I had a patch to change the transaction timeout for the JobPoller
 specifically as it wasn't available in ofbiz at the time, but I don't
 think
 I ever submitted it.  I could look for this patch if anyone is 
 interested
 but it may already be implemented in the framework.

 I

Re: JobManager failing to schedule jobs

Ah okay, that is entirely dependent on the number of jobs and the speed the 
server can process them.  As a side note I would keep a close eye on the 
purgeOldJobs service, when it starts falling over (transaction timeout again) 
then the number of rows in the table will increase quickly which in turn will 
slow down polling.

In general the whole persisted jobs implementation is a bit fragile, especially 
when dealing with a large number of jobs.  I've wanted to replace it with 
something like quartz for a while but haven't had the time.

Regards
Scott

On 14/07/2011, at 2:10 PM, Josh Jacobson wrote:

 Thanks again. I actually meant a suggestion for the transaction
 timeout. In any case I am grateful for your explanation.
 
 
 On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote:
 As best I can tell there shouldn't be any need to increase the interval 
 between polls since the interval timer doesn't actually start until the 
 previous poll has completed (see JobPoller.run()) so I can't see how a small 
 interval would cause any backlog problems.
 
 I'm guessing if there is any lock contention then it's probably caused by 
 the executing jobs trying to update their respective rows while the poller 
 is holding a table lock.  So from that point of view I guess increasing the 
 interval could reduce the amount of contention between the executing jobs 
 and the next poll.
 
 Regards
 Scott
 
 On 14/07/2011, at 1:02 PM, Josh Jacobson wrote:
 
 Scott,
 
 Thanks! That is very precise advise. Do you have a suggestion on
 interval time? 60 seconds? 120?
 
 Thanks,
 
 On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com 
 wrote:
 That configuration is for the frequency of job polls.  There isn't any 
 ability to specify the transaction timeout via configuration so you'll 
 need to modify the code directly:
 JobManager.java (line 148):
 beganTransaction = TransactionUtil.begin();
 needs to be changed to use TransactionUtil.begin(int)
 
 Regards
 Scott
 
 HotWax Media
 http://www.hotwaxmedia.com
 
 On 14/07/2011, at 12:23 PM, Josh Jacobson wrote:
 
 Brett,
 
 Before I start trying to run the jobs manually, I want to give your
 suggestion a try. I think I know where to configure the job polling
 transaction time (I believe it's the poll-db-millis=2 value on
 the framework/service/config/serviceengine.xml.
 
 However, I still don't know what to increase it to. I understand that
 we wouldn't want to make it bigger than the default polling interval.
 Do you know what the default interval between polling is?
 
 Thanks,
 
 On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com 
 wrote:
 I meant removing finished jobs.  If you have thousands of pending jobs 
 then
 you will have the same problem I mentioned in my first email.  One
 resolution will be to increase the job poller transaction time.  In the
 ofbiz version I was using there was not a way to configure the poller
 transaction time.  It just used the default time.  I had to create a 
 patch
 to allow this to happen.
 
 In the patch you had to be careful to not increase the transaction time
 greater than the frequency of the job poller.  Otherwise you get into a 
 lock
 situation where one job poller is still running within a transaction and
 another poller starts.  This didn't create a huge problem but the second 
 job
 poller would usually lock and then time out.
 
 
 
 Brett
 
 
 
 On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson 
 josh.s.jacob...@gmail.comwrote:
 
 Brett,
 
 Can you please explain what you mean by archiving the current JobSandbox
 first?
 Do you mean somehow removing the current pending jobs, applying you
 patch and the copying them back again?
 
 Thanks,
 
 
 On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com
 wrote:
 Josh,
 
 I've also seen this problem if the JobSandbox table has too many rows 
 to
 process.  I ran into a similar problem when I tried to run 10,000 Async
 batch processes.  The time it took for the JobPoller to process all the
 records was too long and the transaction would time out.
 
 I had a patch to change the transaction timeout for the JobPoller
 specifically as it wasn't available in ofbiz at the time, but I don't
 think
 I ever submitted it.  I could look for this patch if anyone is 
 interested
 but it may already be implemented in the framework.
 
 I



smime.p7s
Description: S/MIME cryptographic signature

Re: JobManager failing to schedule jobs

2011-07-13 Thread Brett Palmer

Josh,

I'm attaching the patch I used to work around this issue.  This is based on
an older version of ofbiz so I would compare your current files carefully.

The following files were patched:

service-config.xsd
serviceengine.xml


JobManager.java
JobPoller.java


The patch allowed for a new configuration option

 poll-transaction-timeout=300

I'm pretty sure that I was using 300 seconds for the
poll-transaction-timeout.  I believe the default is 60 or 120 seconds.

I originally created a JIRA issue 3855 for this problem.

https://issues.apache.org/jira/browse/OFBIZ-3855


If you set the transaction time out too high when the poller wakes up to
process new requests it will timeout because the first poller has a lock on
the table (or ofbiz semaphore method).


Here are a couple of other options you could try since the number of pending
jobs is so high.

1. Create a temporary status for the jobSandbox statusId and assign a large
set of pending transactions to this status.  Then only process a few 1000 at
a time.  Then you can incrementally change these back to pending so the
service engine can process them in reasonable batches.  I haven't tried this
option but it would allow you to work with the service engine without
modifying any code.


2.  Start up several more instances of ofbiz all pointing to the same
database.  Each will start service process to process more requests in
parallel.  This probably won't work with out the patch I've attached as each
service process would still time out and not allow other processes to start.



Good luck,



Brett




On Wed, Jul 13, 2011 at 8:10 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote:

 Thanks again. I actually meant a suggestion for the transaction
 timeout. In any case I am grateful for your explanation.


 On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com
 wrote:
  As best I can tell there shouldn't be any need to increase the interval
 between polls since the interval timer doesn't actually start until the
 previous poll has completed (see JobPoller.run()) so I can't see how a small
 interval would cause any backlog problems.
 
  I'm guessing if there is any lock contention then it's probably caused by
 the executing jobs trying to update their respective rows while the poller
 is holding a table lock.  So from that point of view I guess increasing the
 interval could reduce the amount of contention between the executing jobs
 and the next poll.
 
  Regards
  Scott
 
  On 14/07/2011, at 1:02 PM, Josh Jacobson wrote:
 
  Scott,
 
  Thanks! That is very precise advise. Do you have a suggestion on
  interval time? 60 seconds? 120?
 
  Thanks,
 
  On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com
 wrote:
  That configuration is for the frequency of job polls.  There isn't any
 ability to specify the transaction timeout via configuration so you'll need
 to modify the code directly:
  JobManager.java (line 148):
  beganTransaction = TransactionUtil.begin();
  needs to be changed to use TransactionUtil.begin(int)
 
  Regards
  Scott
 
  HotWax Media
  http://www.hotwaxmedia.com
 
  On 14/07/2011, at 12:23 PM, Josh Jacobson wrote:
 
  Brett,
 
  Before I start trying to run the jobs manually, I want to give your
  suggestion a try. I think I know where to configure the job polling
  transaction time (I believe it's the poll-db-millis=2 value on
  the framework/service/config/serviceengine.xml.
 
  However, I still don't know what to increase it to. I understand that
  we wouldn't want to make it bigger than the default polling interval.
  Do you know what the default interval between polling is?
 
  Thanks,
 
  On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer 
 brettgpal...@gmail.com wrote:
  I meant removing finished jobs.  If you have thousands of pending
 jobs then
  you will have the same problem I mentioned in my first email.  One
  resolution will be to increase the job poller transaction time.  In
 the
  ofbiz version I was using there was not a way to configure the poller
  transaction time.  It just used the default time.  I had to create a
 patch
  to allow this to happen.
 
  In the patch you had to be careful to not increase the transaction
 time
  greater than the frequency of the job poller.  Otherwise you get into
 a lock
  situation where one job poller is still running within a transaction
 and
  another poller starts.  This didn't create a huge problem but the
 second job
  poller would usually lock and then time out.
 
 
 
  Brett
 
 
 
  On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson 
 josh.s.jacob...@gmail.comwrote:
 
  Brett,
 
  Can you please explain what you mean by archiving the current
 JobSandbox
  first?
  Do you mean somehow removing the current pending jobs, applying you
  patch and the copying them back again?
 
  Thanks,
 
 
  On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer 
 brettgpal...@gmail.com
  wrote:
  Josh,
 
  I've also seen this problem if the JobSandbox table has too many
 rows

Re: JobManager failing to schedule jobs

I tried 60 seconds for timeout but that didn't work. I guess Ill
double it now and keep trying.

I have about 260,000 pending jobs, and nothing is getting done.

I know what you mean about purgeOldjobs. That service is crashed now
and I deleted old jobs from the database by hand. I was up to 2.6
million rows. Ofbiz was pretty much unusable.

If you have any other suggestions I'd love Yo hear them.

On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote:
 Ah okay, that is entirely dependent on the number of jobs and the speed the 
 server can process them.  As a side note I would keep a close eye on the 
 purgeOldJobs service, when it starts falling over (transaction timeout again) 
 then the number of rows in the table will increase quickly which in turn will 
 slow down polling.

 In general the whole persisted jobs implementation is a bit fragile, 
 especially when dealing with a large number of jobs.  I've wanted to replace 
 it with something like quartz for a while but haven't had the time.

 Regards
 Scott

 On 14/07/2011, at 2:10 PM, Josh Jacobson wrote:

 Thanks again. I actually meant a suggestion for the transaction
 timeout. In any case I am grateful for your explanation.


 On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote:
 As best I can tell there shouldn't be any need to increase the interval 
 between polls since the interval timer doesn't actually start until the 
 previous poll has completed (see JobPoller.run()) so I can't see how a 
 small interval would cause any backlog problems.

 I'm guessing if there is any lock contention then it's probably caused by 
 the executing jobs trying to update their respective rows while the poller 
 is holding a table lock.  So from that point of view I guess increasing the 
 interval could reduce the amount of contention between the executing jobs 
 and the next poll.

 Regards
 Scott

 On 14/07/2011, at 1:02 PM, Josh Jacobson wrote:

 Scott,

 Thanks! That is very precise advise. Do you have a suggestion on
 interval time? 60 seconds? 120?

 Thanks,

 On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com 
 wrote:
 That configuration is for the frequency of job polls.  There isn't any 
 ability to specify the transaction timeout via configuration so you'll 
 need to modify the code directly:
 JobManager.java (line 148):
 beganTransaction = TransactionUtil.begin();
 needs to be changed to use TransactionUtil.begin(int)

 Regards
 Scott

 HotWax Media
 http://www.hotwaxmedia.com

 On 14/07/2011, at 12:23 PM, Josh Jacobson wrote:

 Brett,

 Before I start trying to run the jobs manually, I want to give your
 suggestion a try. I think I know where to configure the job polling
 transaction time (I believe it's the poll-db-millis=2 value on
 the framework/service/config/serviceengine.xml.

 However, I still don't know what to increase it to. I understand that
 we wouldn't want to make it bigger than the default polling interval.
 Do you know what the default interval between polling is?

 Thanks,

 On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com 
 wrote:
 I meant removing finished jobs.  If you have thousands of pending jobs 
 then
 you will have the same problem I mentioned in my first email.  One
 resolution will be to increase the job poller transaction time.  In the
 ofbiz version I was using there was not a way to configure the poller
 transaction time.  It just used the default time.  I had to create a 
 patch
 to allow this to happen.

 In the patch you had to be careful to not increase the transaction time
 greater than the frequency of the job poller.  Otherwise you get into a 
 lock
 situation where one job poller is still running within a transaction and
 another poller starts.  This didn't create a huge problem but the 
 second job
 poller would usually lock and then time out.



 Brett



 On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson 
 josh.s.jacob...@gmail.comwrote:

 Brett,

Re: JobManager failing to schedule jobs

Not sure what db you're using but it probably wouldn't hurt to run a vacuum on 
the table to speed up processing.

By the way, I'm pretty sure the default timeout is 60 seconds so you might want 
to try something a little larger :-)

Regards
Scott

On 14/07/2011, at 2:58 PM, Josh Jacobson wrote:

 I tried 60 seconds for timeout but that didn't work. I guess Ill
 double it now and keep trying.
 
 I have about 260,000 pending jobs, and nothing is getting done.
 
 I know what you mean about purgeOldjobs. That service is crashed now
 and I deleted old jobs from the database by hand. I was up to 2.6
 million rows. Ofbiz was pretty much unusable.
 
 If you have any other suggestions I'd love Yo hear them.
 
 On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote:
 Ah okay, that is entirely dependent on the number of jobs and the speed the 
 server can process them.  As a side note I would keep a close eye on the 
 purgeOldJobs service, when it starts falling over (transaction timeout 
 again) then the number of rows in the table will increase quickly which in 
 turn will slow down polling.
 
 In general the whole persisted jobs implementation is a bit fragile, 
 especially when dealing with a large number of jobs.  I've wanted to replace 
 it with something like quartz for a while but haven't had the time.
 
 Regards
 Scott
 
 On 14/07/2011, at 2:10 PM, Josh Jacobson wrote:
 
 Thanks again. I actually meant a suggestion for the transaction
 timeout. In any case I am grateful for your explanation.
 
 
 On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote:
 As best I can tell there shouldn't be any need to increase the interval 
 between polls since the interval timer doesn't actually start until the 
 previous poll has completed (see JobPoller.run()) so I can't see how a 
 small interval would cause any backlog problems.
 
 I'm guessing if there is any lock contention then it's probably caused by 
 the executing jobs trying to update their respective rows while the poller 
 is holding a table lock.  So from that point of view I guess increasing 
 the interval could reduce the amount of contention between the executing 
 jobs and the next poll.
 
 Regards
 Scott
 
 On 14/07/2011, at 1:02 PM, Josh Jacobson wrote:
 
 Scott,
 
 Thanks! That is very precise advise. Do you have a suggestion on
 interval time? 60 seconds? 120?
 
 Thanks,
 
 On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com 
 wrote:
 That configuration is for the frequency of job polls.  There isn't any 
 ability to specify the transaction timeout via configuration so you'll 
 need to modify the code directly:
 JobManager.java (line 148):
 beganTransaction = TransactionUtil.begin();
 needs to be changed to use TransactionUtil.begin(int)
 
 Regards
 Scott
 
 HotWax Media
 http://www.hotwaxmedia.com
 
 On 14/07/2011, at 12:23 PM, Josh Jacobson wrote:
 
 Brett,
 
 Before I start trying to run the jobs manually, I want to give your
 suggestion a try. I think I know where to configure the job polling
 transaction time (I believe it's the poll-db-millis=2 value on
 the framework/service/config/serviceengine.xml.
 
 However, I still don't know what to increase it to. I understand that
 we wouldn't want to make it bigger than the default polling interval.
 Do you know what the default interval between polling is?
 
 Thanks,
 
 On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com 
 wrote:
 I meant removing finished jobs.  If you have thousands of pending jobs 
 then
 you will have the same problem I mentioned in my first email.  One
 resolution will be to increase the job poller transaction time.  In the
 ofbiz version I was using there was not a way to configure the poller
 transaction time.  It just used the default time.  I had to create a 
 patch
 to allow this to happen.
 
 In the patch you had to be careful to not increase the transaction time
 greater than the frequency of the job poller.  Otherwise you get into 
 a lock
 situation where one job poller is still running within a transaction 
 and
 another poller starts.  This didn't create a huge problem but the 
 second job
 poller would usually lock and then time out.
 
 
 
 Brett
 
 
 
 On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson 
 josh.s.jacob...@gmail.comwrote:
 
 Brett,
 
 



smime.p7s
Description: S/MIME cryptographic signature

Re: JobManager failing to schedule jobs

Vacuum has been run, (took quite a while). Yeah, I see now that the
JobManager actually tries to update all the JobSandbox rows in the
transaction, so 60 seconds was pretty low.

I am trying 10 minutes now and see how that goes.

I am using postgress by the way.

Thanks for the help, I really appreciate it.

--
Josh.

On Wed, Jul 13, 2011 at 8:29 PM, Scott Gray scott.g...@hotwaxmedia.com wrote:
 Not sure what db you're using but it probably wouldn't hurt to run a vacuum 
 on the table to speed up processing.

 By the way, I'm pretty sure the default timeout is 60 seconds so you might 
 want to try something a little larger :-)

 Regards
 Scott

 On 14/07/2011, at 2:58 PM, Josh Jacobson wrote:

 I tried 60 seconds for timeout but that didn't work. I guess Ill
 double it now and keep trying.

 I have about 260,000 pending jobs, and nothing is getting done.

 I know what you mean about purgeOldjobs. That service is crashed now
 and I deleted old jobs from the database by hand. I was up to 2.6
 million rows. Ofbiz was pretty much unusable.

 If you have any other suggestions I'd love Yo hear them.

 On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote:
 Ah okay, that is entirely dependent on the number of jobs and the speed the 
 server can process them.  As a side note I would keep a close eye on the 
 purgeOldJobs service, when it starts falling over (transaction timeout 
 again) then the number of rows in the table will increase quickly which in 
 turn will slow down polling.

 In general the whole persisted jobs implementation is a bit fragile, 
 especially when dealing with a large number of jobs.  I've wanted to 
 replace it with something like quartz for a while but haven't had the time.

 Regards
 Scott

 On 14/07/2011, at 2:10 PM, Josh Jacobson wrote:

 Thanks again. I actually meant a suggestion for the transaction
 timeout. In any case I am grateful for your explanation.


 On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote:
 As best I can tell there shouldn't be any need to increase the interval 
 between polls since the interval timer doesn't actually start until the 
 previous poll has completed (see JobPoller.run()) so I can't see how a 
 small interval would cause any backlog problems.

 I'm guessing if there is any lock contention then it's probably caused by 
 the executing jobs trying to update their respective rows while the 
 poller is holding a table lock.  So from that point of view I guess 
 increasing the interval could reduce the amount of contention between the 
 executing jobs and the next poll.

 Regards
 Scott

 On 14/07/2011, at 1:02 PM, Josh Jacobson wrote:

 Scott,

 Thanks! That is very precise advise. Do you have a suggestion on
 interval time? 60 seconds? 120?

 Thanks,

 On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com 
 wrote:
 That configuration is for the frequency of job polls.  There isn't any 
 ability to specify the transaction timeout via configuration so you'll 
 need to modify the code directly:
 JobManager.java (line 148):
 beganTransaction = TransactionUtil.begin();
 needs to be changed to use TransactionUtil.begin(int)

 Regards
 Scott

 HotWax Media
 http://www.hotwaxmedia.com

 On 14/07/2011, at 12:23 PM, Josh Jacobson wrote:

 Brett,

 Before I start trying to run the jobs manually, I want to give your
 suggestion a try. I think I know where to configure the job polling
 transaction time (I believe it's the poll-db-millis=2 value on
 the framework/service/config/serviceengine.xml.

 However, I still don't know what to increase it to. I understand that
 we wouldn't want to make it bigger than the default polling interval.
 Do you know what the default interval between polling is?

 Thanks,

 On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer 
 brettgpal...@gmail.com wrote:
 I meant removing finished jobs.  If you have thousands of pending 
 jobs then
 you will have the same problem I mentioned in my first email.  One
 resolution will be to increase the job poller transaction time.  In 
 the
 ofbiz version I was using there was not a way to configure the poller
 transaction time.  It just used the default time.  I had to create a 
 patch
 to allow this to happen.

 In the patch you had to be careful to not increase the transaction 
 time
 greater than the frequency of the job poller.  Otherwise you get into 
 a lock
 situation where one job poller is still running within a transaction 
 and
 another poller starts.  This didn't create a huge problem but the 
 second job
 poller would usually lock and then time out.



 Brett



 On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson 
 josh.s.jacob...@gmail.comwrote:

 Brett,

Re: JobManager failing to schedule jobs