Re: JobManager failing to schedule jobs
One feature that would help to prevent this problem in the future is a configuration parameter in the service engine that would set the maximum number of jobs the poller would process at a time. Right now the poller reads the JobSandbox and gets every job that has a status of Pending. Then it tries to change the status for each of these to running (or something like that). If the number of pending jobs is too large the poller will time out before it can change the state of all the pending jobs. Changing the transaction timeout can help this problem but having another configuration like max-poll-jobs could limit the number of pending jobs that are processed in one transaction. There is a configuration called jobs but I don't think that is used by the polling process. I've tried to use the service engine as an asynchronous batch server but run into problems when the number of pending jobs gets around 10,000. Brett On Wed, Jul 13, 2011 at 10:34 PM, BJ Freeman bjf...@free-man.net wrote: you going to run into this from time to time or one reason or another. the approach I took was to spread the jobs out so they are not lumped together. take a look at how the jobs are Marshalled to be run. Josh Jacobson sent the following on 7/13/2011 8:35 PM: Vacuum has been run, (took quite a while). Yeah, I see now that the JobManager actually tries to update all the JobSandbox rows in the transaction, so 60 seconds was pretty low. I am trying 10 minutes now and see how that goes. I am using postgress by the way. Thanks for the help, I really appreciate it. -- Josh. On Wed, Jul 13, 2011 at 8:29 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: Not sure what db you're using but it probably wouldn't hurt to run a vacuum on the table to speed up processing. By the way, I'm pretty sure the default timeout is 60 seconds so you might want to try something a little larger :-) Regards Scott On 14/07/2011, at 2:58 PM, Josh Jacobson wrote: I tried 60 seconds for timeout but that didn't work. I guess Ill double it now and keep trying. I have about 260,000 pending jobs, and nothing is getting done. I know what you mean about purgeOldjobs. That service is crashed now and I deleted old jobs from the database by hand. I was up to 2.6 million rows. Ofbiz was pretty much unusable. If you have any other suggestions I'd love Yo hear them. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: Ah okay, that is entirely dependent on the number of jobs and the speed the server can process them. As a side note I would keep a close eye on the purgeOldJobs service, when it starts falling over (transaction timeout again) then the number of rows in the table will increase quickly which in turn will slow down polling. In general the whole persisted jobs implementation is a bit fragile, especially when dealing with a large number of jobs. I've wanted to replace it with something like quartz for a while but haven't had the time. Regards Scott On 14/07/2011, at 2:10 PM, Josh Jacobson wrote: Thanks again. I actually meant a suggestion for the transaction timeout. In any case I am grateful for your explanation. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: As best I can tell there shouldn't be any need to increase the interval between polls since the interval timer doesn't actually start until the previous poll has completed (see JobPoller.run()) so I can't see how a small interval would cause any backlog problems. I'm guessing if there is any lock contention then it's probably caused by the executing jobs trying to update their respective rows while the poller is holding a table lock. So from that point of view I guess increasing the interval could reduce the amount of contention between the executing jobs and the next poll. Regards Scott On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: Scott, Thanks! That is very precise advise. Do you have a suggestion on interval time? 60 seconds? 120? Thanks, On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: JobManager.java (line 148): beganTransaction = TransactionUtil.begin(); needs to be changed to use TransactionUtil.begin(int) Regards Scott HotWax Media http://www.hotwaxmedia.com On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: Brett, Before I start trying to run the jobs manually, I want to give your suggestion a try. I think I know where to configure the job polling transaction time (I believe it's the poll-db-millis=2 value on the framework/service/config/serviceengine.xml. However, I still don't know what to increase it to. I understand that we
Re: JobManager failing to schedule jobs
I find that anything not time based does not work when, like you said the numbers get large. I added the createtime to the conditions currently set in the milliseconds. Brett Palmer sent the following on 7/14/2011 5:35 AM: One feature that would help to prevent this problem in the future is a configuration parameter in the service engine that would set the maximum number of jobs the poller would process at a time. Right now the poller reads the JobSandbox and gets every job that has a status of Pending. Then it tries to change the status for each of these to running (or something like that). If the number of pending jobs is too large the poller will time out before it can change the state of all the pending jobs. Changing the transaction timeout can help this problem but having another configuration like max-poll-jobs could limit the number of pending jobs that are processed in one transaction. There is a configuration called jobs but I don't think that is used by the polling process. I've tried to use the service engine as an asynchronous batch server but run into problems when the number of pending jobs gets around 10,000. Brett On Wed, Jul 13, 2011 at 10:34 PM, BJ Freeman bjf...@free-man.net wrote: you going to run into this from time to time or one reason or another. the approach I took was to spread the jobs out so they are not lumped together. take a look at how the jobs are Marshalled to be run. Josh Jacobson sent the following on 7/13/2011 8:35 PM: Vacuum has been run, (took quite a while). Yeah, I see now that the JobManager actually tries to update all the JobSandbox rows in the transaction, so 60 seconds was pretty low. I am trying 10 minutes now and see how that goes. I am using postgress by the way. Thanks for the help, I really appreciate it. -- Josh. On Wed, Jul 13, 2011 at 8:29 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: Not sure what db you're using but it probably wouldn't hurt to run a vacuum on the table to speed up processing. By the way, I'm pretty sure the default timeout is 60 seconds so you might want to try something a little larger :-) Regards Scott On 14/07/2011, at 2:58 PM, Josh Jacobson wrote: I tried 60 seconds for timeout but that didn't work. I guess Ill double it now and keep trying. I have about 260,000 pending jobs, and nothing is getting done. I know what you mean about purgeOldjobs. That service is crashed now and I deleted old jobs from the database by hand. I was up to 2.6 million rows. Ofbiz was pretty much unusable. If you have any other suggestions I'd love Yo hear them. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: Ah okay, that is entirely dependent on the number of jobs and the speed the server can process them. As a side note I would keep a close eye on the purgeOldJobs service, when it starts falling over (transaction timeout again) then the number of rows in the table will increase quickly which in turn will slow down polling. In general the whole persisted jobs implementation is a bit fragile, especially when dealing with a large number of jobs. I've wanted to replace it with something like quartz for a while but haven't had the time. Regards Scott On 14/07/2011, at 2:10 PM, Josh Jacobson wrote: Thanks again. I actually meant a suggestion for the transaction timeout. In any case I am grateful for your explanation. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: As best I can tell there shouldn't be any need to increase the interval between polls since the interval timer doesn't actually start until the previous poll has completed (see JobPoller.run()) so I can't see how a small interval would cause any backlog problems. I'm guessing if there is any lock contention then it's probably caused by the executing jobs trying to update their respective rows while the poller is holding a table lock. So from that point of view I guess increasing the interval could reduce the amount of contention between the executing jobs and the next poll. Regards Scott On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: Scott, Thanks! That is very precise advise. Do you have a suggestion on interval time? 60 seconds? 120? Thanks, On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: JobManager.java (line 148): beganTransaction = TransactionUtil.begin(); needs to be changed to use TransactionUtil.begin(int) Regards Scott HotWax Media http://www.hotwaxmedia.com On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: Brett, Before I start trying to run the jobs manually, I want to give your suggestion a try. I think I know where to configure the job polling transaction time (I believe it's the
Re: JobManager failing to schedule jobs
I should add that the environment also has a lot to o with this. In this area I have changed to Solid State Drives for Storage and 32gb SDHC for Swap files. BJ Freeman sent the following on 7/14/2011 8:09 AM: I find that anything not time based does not work when, like you said the numbers get large. I added the createtime to the conditions currently set in the milliseconds. Brett Palmer sent the following on 7/14/2011 5:35 AM: One feature that would help to prevent this problem in the future is a configuration parameter in the service engine that would set the maximum number of jobs the poller would process at a time. Right now the poller reads the JobSandbox and gets every job that has a status of Pending. Then it tries to change the status for each of these to running (or something like that). If the number of pending jobs is too large the poller will time out before it can change the state of all the pending jobs. Changing the transaction timeout can help this problem but having another configuration like max-poll-jobs could limit the number of pending jobs that are processed in one transaction. There is a configuration called jobs but I don't think that is used by the polling process. I've tried to use the service engine as an asynchronous batch server but run into problems when the number of pending jobs gets around 10,000. Brett On Wed, Jul 13, 2011 at 10:34 PM, BJ Freeman bjf...@free-man.net wrote: you going to run into this from time to time or one reason or another. the approach I took was to spread the jobs out so they are not lumped together. take a look at how the jobs are Marshalled to be run. Josh Jacobson sent the following on 7/13/2011 8:35 PM: Vacuum has been run, (took quite a while). Yeah, I see now that the JobManager actually tries to update all the JobSandbox rows in the transaction, so 60 seconds was pretty low. I am trying 10 minutes now and see how that goes. I am using postgress by the way. Thanks for the help, I really appreciate it. -- Josh. On Wed, Jul 13, 2011 at 8:29 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: Not sure what db you're using but it probably wouldn't hurt to run a vacuum on the table to speed up processing. By the way, I'm pretty sure the default timeout is 60 seconds so you might want to try something a little larger :-) Regards Scott On 14/07/2011, at 2:58 PM, Josh Jacobson wrote: I tried 60 seconds for timeout but that didn't work. I guess Ill double it now and keep trying. I have about 260,000 pending jobs, and nothing is getting done. I know what you mean about purgeOldjobs. That service is crashed now and I deleted old jobs from the database by hand. I was up to 2.6 million rows. Ofbiz was pretty much unusable. If you have any other suggestions I'd love Yo hear them. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: Ah okay, that is entirely dependent on the number of jobs and the speed the server can process them. As a side note I would keep a close eye on the purgeOldJobs service, when it starts falling over (transaction timeout again) then the number of rows in the table will increase quickly which in turn will slow down polling. In general the whole persisted jobs implementation is a bit fragile, especially when dealing with a large number of jobs. I've wanted to replace it with something like quartz for a while but haven't had the time. Regards Scott On 14/07/2011, at 2:10 PM, Josh Jacobson wrote: Thanks again. I actually meant a suggestion for the transaction timeout. In any case I am grateful for your explanation. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: As best I can tell there shouldn't be any need to increase the interval between polls since the interval timer doesn't actually start until the previous poll has completed (see JobPoller.run()) so I can't see how a small interval would cause any backlog problems. I'm guessing if there is any lock contention then it's probably caused by the executing jobs trying to update their respective rows while the poller is holding a table lock. So from that point of view I guess increasing the interval could reduce the amount of contention between the executing jobs and the next poll. Regards Scott On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: Scott, Thanks! That is very precise advise. Do you have a suggestion on interval time? 60 seconds? 120? Thanks, On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: JobManager.java (line 148): beganTransaction = TransactionUtil.begin(); needs to be changed to use TransactionUtil.begin(int) Regards Scott HotWax Media http://www.hotwaxmedia.com On 14/07/2011, at 12:23
Re: JobManager failing to schedule jobs
the key is Transaction timeout this could be the job length could be the database connection please specify the version of ofbiz since earlier transaction problems were taken care of by changing code that deals with transactions. Josh Jacobson sent the following on 7/13/2011 11:48 AM: Hello Everyone, I have an ofbiz instance in production where none of the jobs are being performed. I have about 160K jobs in pending status, but they are never being schedule. I can see the following in the log: 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [ JobManager.java:201:ERROR] exception report -- Transaction error trying to commit when polling and updating the JobSandbox: org.ofbiz.entity.transaction.GenericTransactionException: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) Exception: org.ofbiz.entity.transaction.GenericTransactionException Message: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) cause - Exception: javax.transaction.RollbackException Message: Transaction timeout stack trace --- javax.transaction.RollbackException: Transaction timeout org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269) org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245) org.ofbiz.service.job.JobManager.poll(JobManager.java:197) org.ofbiz.service.job.JobPoller.run(JobPoller.java:90) java.lang.Thread.run(Thread.java:619) I believe that the JobManager is not being able to handle all those jobs to schedule them, so nothing is being scheduled, which of course make the job list longer. Can anyone think of how to make the jobs run? All help much appreciated,
Re: JobManager failing to schedule jobs
Josh, I've also seen this problem if the JobSandbox table has too many rows to process. I ran into a similar problem when I tried to run 10,000 Async batch processes. The time it took for the JobPoller to process all the records was too long and the transaction would time out. I had a patch to change the transaction timeout for the JobPoller specifically as it wasn't available in ofbiz at the time, but I don't think I ever submitted it. I could look for this patch if anyone is interested but it may already be implemented in the framework. I would try archiving jobs from the JobSandbox first. Brett On Wed, Jul 13, 2011 at 12:48 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Hello Everyone, I have an ofbiz instance in production where none of the jobs are being performed. I have about 160K jobs in pending status, but they are never being schedule. I can see the following in the log: 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [ JobManager.java:201:ERROR] exception report -- Transaction error trying to commit when polling and updating the JobSandbox: org.ofbiz.entity.transaction.GenericTransactionException: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) Exception: org.ofbiz.entity.transaction.GenericTransactionException Message: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) cause - Exception: javax.transaction.RollbackException Message: Transaction timeout stack trace --- javax.transaction.RollbackException: Transaction timeout org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269) org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245) org.ofbiz.service.job.JobManager.poll(JobManager.java:197) org.ofbiz.service.job.JobPoller.run(JobPoller.java:90) java.lang.Thread.run(Thread.java:619) I believe that the JobManager is not being able to handle all those jobs to schedule them, so nothing is being scheduled, which of course make the job list longer. Can anyone think of how to make the jobs run? All help much appreciated, -- Josh.
Re: JobManager failing to schedule jobs
BJ, I am running 10.04. On Wed, Jul 13, 2011 at 12:00 PM, BJ Freeman bjf...@free-man.net wrote: the key is Transaction timeout this could be the job length could be the database connection please specify the version of ofbiz since earlier transaction problems were taken care of by changing code that deals with transactions. Josh Jacobson sent the following on 7/13/2011 11:48 AM: Hello Everyone, I have an ofbiz instance in production where none of the jobs are being performed. I have about 160K jobs in pending status, but they are never being schedule. I can see the following in the log: 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [ JobManager.java:201:ERROR] exception report -- Transaction error trying to commit when polling and updating the JobSandbox: org.ofbiz.entity.transaction.GenericTransactionException: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) Exception: org.ofbiz.entity.transaction.GenericTransactionException Message: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) cause - Exception: javax.transaction.RollbackException Message: Transaction timeout stack trace --- javax.transaction.RollbackException: Transaction timeout org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269) org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245) org.ofbiz.service.job.JobManager.poll(JobManager.java:197) org.ofbiz.service.job.JobPoller.run(JobPoller.java:90) java.lang.Thread.run(Thread.java:619) I believe that the JobManager is not being able to handle all those jobs to schedule them, so nothing is being scheduled, which of course make the job list longer. Can anyone think of how to make the jobs run? All help much appreciated,
Re: JobManager failing to schedule jobs
Brett, Can you please explain what you mean by archiving the current JobSandbox first? Do you mean somehow removing the current pending jobs, applying you patch and the copying them back again? Thanks, On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com wrote: Josh, I've also seen this problem if the JobSandbox table has too many rows to process. I ran into a similar problem when I tried to run 10,000 Async batch processes. The time it took for the JobPoller to process all the records was too long and the transaction would time out. I had a patch to change the transaction timeout for the JobPoller specifically as it wasn't available in ofbiz at the time, but I don't think I ever submitted it. I could look for this patch if anyone is interested but it may already be implemented in the framework. I would try archiving jobs from the JobSandbox first. Brett On Wed, Jul 13, 2011 at 12:48 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Hello Everyone, I have an ofbiz instance in production where none of the jobs are being performed. I have about 160K jobs in pending status, but they are never being schedule. I can see the following in the log: 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [ JobManager.java:201:ERROR] exception report -- Transaction error trying to commit when polling and updating the JobSandbox: org.ofbiz.entity.transaction.GenericTransactionException: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) Exception: org.ofbiz.entity.transaction.GenericTransactionException Message: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) cause - Exception: javax.transaction.RollbackException Message: Transaction timeout stack trace --- javax.transaction.RollbackException: Transaction timeout org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269) org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245) org.ofbiz.service.job.JobManager.poll(JobManager.java:197) org.ofbiz.service.job.JobPoller.run(JobPoller.java:90) java.lang.Thread.run(Thread.java:619) I believe that the JobManager is not being able to handle all those jobs to schedule them, so nothing is being scheduled, which of course make the job list longer. Can anyone think of how to make the jobs run? All help much appreciated, -- Josh.
Re: JobManager failing to schedule jobs
Ok so you have the latest code. what is the eviorment you working with. OS Memory CPU speed Josh Jacobson sent the following on 7/13/2011 12:12 PM: BJ, I am running 10.04. On Wed, Jul 13, 2011 at 12:00 PM, BJ Freeman bjf...@free-man.net wrote: the key is Transaction timeout this could be the job length could be the database connection please specify the version of ofbiz since earlier transaction problems were taken care of by changing code that deals with transactions. Josh Jacobson sent the following on 7/13/2011 11:48 AM: Hello Everyone, I have an ofbiz instance in production where none of the jobs are being performed. I have about 160K jobs in pending status, but they are never being schedule. I can see the following in the log: 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [ JobManager.java:201:ERROR] exception report -- Transaction error trying to commit when polling and updating the JobSandbox: org.ofbiz.entity.transaction.GenericTransactionException: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) Exception: org.ofbiz.entity.transaction.GenericTransactionException Message: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) cause - Exception: javax.transaction.RollbackException Message: Transaction timeout stack trace --- javax.transaction.RollbackException: Transaction timeout org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269) org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245) org.ofbiz.service.job.JobManager.poll(JobManager.java:197) org.ofbiz.service.job.JobPoller.run(JobPoller.java:90) java.lang.Thread.run(Thread.java:619) I believe that the JobManager is not being able to handle all those jobs to schedule them, so nothing is being scheduled, which of course make the job list longer. Can anyone think of how to make the jobs run? All help much appreciated,
Re: JobManager failing to schedule jobs
I meant removing finished jobs. If you have thousands of pending jobs then you will have the same problem I mentioned in my first email. One resolution will be to increase the job poller transaction time. In the ofbiz version I was using there was not a way to configure the poller transaction time. It just used the default time. I had to create a patch to allow this to happen. In the patch you had to be careful to not increase the transaction time greater than the frequency of the job poller. Otherwise you get into a lock situation where one job poller is still running within a transaction and another poller starts. This didn't create a huge problem but the second job poller would usually lock and then time out. Brett On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Brett, Can you please explain what you mean by archiving the current JobSandbox first? Do you mean somehow removing the current pending jobs, applying you patch and the copying them back again? Thanks, On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com wrote: Josh, I've also seen this problem if the JobSandbox table has too many rows to process. I ran into a similar problem when I tried to run 10,000 Async batch processes. The time it took for the JobPoller to process all the records was too long and the transaction would time out. I had a patch to change the transaction timeout for the JobPoller specifically as it wasn't available in ofbiz at the time, but I don't think I ever submitted it. I could look for this patch if anyone is interested but it may already be implemented in the framework. I would try archiving jobs from the JobSandbox first. Brett On Wed, Jul 13, 2011 at 12:48 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Hello Everyone, I have an ofbiz instance in production where none of the jobs are being performed. I have about 160K jobs in pending status, but they are never being schedule. I can see the following in the log: 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [ JobManager.java:201:ERROR] exception report -- Transaction error trying to commit when polling and updating the JobSandbox: org.ofbiz.entity.transaction.GenericTransactionException: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) Exception: org.ofbiz.entity.transaction.GenericTransactionException Message: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) cause - Exception: javax.transaction.RollbackException Message: Transaction timeout stack trace --- javax.transaction.RollbackException: Transaction timeout org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269) org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245) org.ofbiz.service.job.JobManager.poll(JobManager.java:197) org.ofbiz.service.job.JobPoller.run(JobPoller.java:90) java.lang.Thread.run(Thread.java:619) I believe that the JobManager is not being able to handle all those jobs to schedule them, so nothing is being scheduled, which of course make the job list longer. Can anyone think of how to make the jobs run? All help much appreciated, -- Josh.
Re: JobManager failing to schedule jobs
Currently I am running: Red Hat Enterprise Linux Server release 5.5 6 CPUs, 16384MB RAM It was very recently upgraded from 2 CPUs and 8GB of RAM because we were having performance issues (lots of swap memory being used). It's on one of those cloud servers. Now it's running without using any swap. On Wed, Jul 13, 2011 at 12:22 PM, BJ Freeman bjf...@free-man.net wrote: Ok so you have the latest code. what is the eviorment you working with. OS Memory CPU speed Josh Jacobson sent the following on 7/13/2011 12:12 PM: BJ, I am running 10.04. On Wed, Jul 13, 2011 at 12:00 PM, BJ Freeman bjf...@free-man.net wrote: the key is Transaction timeout this could be the job length could be the database connection please specify the version of ofbiz since earlier transaction problems were taken care of by changing code that deals with transactions. Josh Jacobson sent the following on 7/13/2011 11:48 AM: Hello Everyone, I have an ofbiz instance in production where none of the jobs are being performed. I have about 160K jobs in pending status, but they are never being schedule. I can see the following in the log: 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [ JobManager.java:201:ERROR] exception report -- Transaction error trying to commit when polling and updating the JobSandbox: org.ofbiz.entity.transaction.GenericTransactionException: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) Exception: org.ofbiz.entity.transaction.GenericTransactionException Message: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) cause - Exception: javax.transaction.RollbackException Message: Transaction timeout stack trace --- javax.transaction.RollbackException: Transaction timeout org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269) org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245) org.ofbiz.service.job.JobManager.poll(JobManager.java:197) org.ofbiz.service.job.JobPoller.run(JobPoller.java:90) java.lang.Thread.run(Thread.java:619) I believe that the JobManager is not being able to handle all those jobs to schedule them, so nothing is being scheduled, which of course make the job list longer. Can anyone think of how to make the jobs run? All help much appreciated,
Re: JobManager failing to schedule jobs
On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com wrote: I meant removing finished jobs. If you have thousands of pending jobs then you will have the same problem I mentioned in my first email. One resolution will be to increase the job poller transaction time. In the ofbiz version I was using there was not a way to configure the poller transaction time. It just used the default time. I had to create a patch to allow this to happen. I see. I already did that: We had 2.6 million lines on the JobSandbox, mostly of completed or failed jobs. We deleted completed and failed and are now looking at about 260L pending jobs. I want to run those jobs, so I can get the machine back to normal. In the patch you had to be careful to not increase the transaction time greater than the frequency of the job poller. Otherwise you get into a lock situation where one job poller is still running within a transaction and another poller starts. This didn't create a huge problem but the second job poller would usually lock and then time out. I understand the possible race condition. So how do I figure what to set the timeout to and where do I configure that? Thanks, -- Josh.
Re: JobManager failing to schedule jobs
On Wed, Jul 13, 2011 at 12:51 PM, Josh Jacobson josh.s.jacob...@gmail.com wrote: On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com wrote: I meant removing finished jobs. If you have thousands of pending jobs then you will have the same problem I mentioned in my first email. One resolution will be to increase the job poller transaction time. In the ofbiz version I was using there was not a way to configure the poller transaction time. It just used the default time. I had to create a patch to allow this to happen. I see. I already did that: We had 2.6 million lines on the JobSandbox, mostly of completed or failed jobs. We deleted completed and failed and are now looking at about 260L pending jobs. I want to run those jobs, so I can get the machine back to normal. Sorry, just noticed the typo: We currently have 260K + jobs as pending and I want to process them to get things back to normal. Thanks for the help, -- Josh.
Re: JobManager failing to schedule jobs
You now know why I don't recommend cloud configuration for realtime operations, unless your running over dedicate lines not part of the internet. to summarize you environment caused the problem not ofbiz Now you have jobs cued that should have been run but have piled up. you need a way to get the job run so they don;t time out the system. I recommend you look at the purge old jobs service, copy and modify it to run your jobs, maybe by time group. Josh Jacobson sent the following on 7/13/2011 12:48 PM: Currently I am running: Red Hat Enterprise Linux Server release 5.5 6 CPUs, 16384MB RAM It was very recently upgraded from 2 CPUs and 8GB of RAM because we were having performance issues (lots of swap memory being used). It's on one of those cloud servers. Now it's running without using any swap. On Wed, Jul 13, 2011 at 12:22 PM, BJ Freeman bjf...@free-man.net wrote: Ok so you have the latest code. what is the eviorment you working with. OS Memory CPU speed Josh Jacobson sent the following on 7/13/2011 12:12 PM: BJ, I am running 10.04. On Wed, Jul 13, 2011 at 12:00 PM, BJ Freeman bjf...@free-man.net wrote: the key is Transaction timeout this could be the job length could be the database connection please specify the version of ofbiz since earlier transaction problems were taken care of by changing code that deals with transactions. Josh Jacobson sent the following on 7/13/2011 11:48 AM: Hello Everyone, I have an ofbiz instance in production where none of the jobs are being performed. I have about 160K jobs in pending status, but they are never being schedule. I can see the following in the log: 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [ JobManager.java:201:ERROR] exception report -- Transaction error trying to commit when polling and updating the JobSandbox: org.ofbiz.entity.transaction.GenericTransactionException: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) Exception: org.ofbiz.entity.transaction.GenericTransactionException Message: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) cause - Exception: javax.transaction.RollbackException Message: Transaction timeout stack trace --- javax.transaction.RollbackException: Transaction timeout org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269) org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245) org.ofbiz.service.job.JobManager.poll(JobManager.java:197) org.ofbiz.service.job.JobPoller.run(JobPoller.java:90) java.lang.Thread.run(Thread.java:619) I believe that the JobManager is not being able to handle all those jobs to schedule them, so nothing is being scheduled, which of course make the job list longer. Can anyone think of how to make the jobs run? All help much appreciated,
Re: JobManager failing to schedule jobs
Thanks for the pointers. I'll take a look. There is one more piece of information: The purgeOldJobs service is in a crashed status. Do you think that is significant? Thanks, On Wed, Jul 13, 2011 at 4:32 PM, BJ Freeman bjf...@free-man.net wrote: You now know why I don't recommend cloud configuration for realtime operations, unless your running over dedicate lines not part of the internet. to summarize you environment caused the problem not ofbiz Now you have jobs cued that should have been run but have piled up. you need a way to get the job run so they don;t time out the system. I recommend you look at the purge old jobs service, copy and modify it to run your jobs, maybe by time group. Josh Jacobson sent the following on 7/13/2011 12:48 PM: Currently I am running: Red Hat Enterprise Linux Server release 5.5 6 CPUs, 16384MB RAM It was very recently upgraded from 2 CPUs and 8GB of RAM because we were having performance issues (lots of swap memory being used). It's on one of those cloud servers. Now it's running without using any swap. On Wed, Jul 13, 2011 at 12:22 PM, BJ Freeman bjf...@free-man.net wrote: Ok so you have the latest code. what is the eviorment you working with. OS Memory CPU speed Josh Jacobson sent the following on 7/13/2011 12:12 PM: BJ, I am running 10.04. On Wed, Jul 13, 2011 at 12:00 PM, BJ Freeman bjf...@free-man.net wrote: the key is Transaction timeout this could be the job length could be the database connection please specify the version of ofbiz since earlier transaction problems were taken care of by changing code that deals with transactions. Josh Jacobson sent the following on 7/13/2011 11:48 AM: Hello Everyone, I have an ofbiz instance in production where none of the jobs are being performed. I have about 160K jobs in pending status, but they are never being schedule. I can see the following in the log: 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [ JobManager.java:201:ERROR] exception report -- Transaction error trying to commit when polling and updating the JobSandbox: org.ofbiz.entity.transaction.GenericTransactionException: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) Exception: org.ofbiz.entity.transaction.GenericTransactionException Message: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) cause - Exception: javax.transaction.RollbackException Message: Transaction timeout stack trace --- javax.transaction.RollbackException: Transaction timeout org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269) org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245) org.ofbiz.service.job.JobManager.poll(JobManager.java:197) org.ofbiz.service.job.JobPoller.run(JobPoller.java:90) java.lang.Thread.run(Thread.java:619) I believe that the JobManager is not being able to handle all those jobs to schedule them, so nothing is being scheduled, which of course make the job list longer. Can anyone think of how to make the jobs run? All help much appreciated,
Re: JobManager failing to schedule jobs
it means it will not purge job done so you will get a build up you can do a run service to start it again Josh Jacobson sent the following on 7/13/2011 4:41 PM: Thanks for the pointers. I'll take a look. There is one more piece of information: The purgeOldJobs service is in a crashed status. Do you think that is significant? Thanks, On Wed, Jul 13, 2011 at 4:32 PM, BJ Freeman bjf...@free-man.net wrote: You now know why I don't recommend cloud configuration for realtime operations, unless your running over dedicate lines not part of the internet. to summarize you environment caused the problem not ofbiz Now you have jobs cued that should have been run but have piled up. you need a way to get the job run so they don;t time out the system. I recommend you look at the purge old jobs service, copy and modify it to run your jobs, maybe by time group. Josh Jacobson sent the following on 7/13/2011 12:48 PM: Currently I am running: Red Hat Enterprise Linux Server release 5.5 6 CPUs, 16384MB RAM It was very recently upgraded from 2 CPUs and 8GB of RAM because we were having performance issues (lots of swap memory being used). It's on one of those cloud servers. Now it's running without using any swap. On Wed, Jul 13, 2011 at 12:22 PM, BJ Freeman bjf...@free-man.net wrote: Ok so you have the latest code. what is the eviorment you working with. OS Memory CPU speed Josh Jacobson sent the following on 7/13/2011 12:12 PM: BJ, I am running 10.04. On Wed, Jul 13, 2011 at 12:00 PM, BJ Freeman bjf...@free-man.net wrote: the key is Transaction timeout this could be the job length could be the database connection please specify the version of ofbiz since earlier transaction problems were taken care of by changing code that deals with transactions. Josh Jacobson sent the following on 7/13/2011 11:48 AM: Hello Everyone, I have an ofbiz instance in production where none of the jobs are being performed. I have about 160K jobs in pending status, but they are never being schedule. I can see the following in the log: 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [ JobManager.java:201:ERROR] exception report -- Transaction error trying to commit when polling and updating the JobSandbox: org.ofbiz.entity.transaction.GenericTransactionException: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) Exception: org.ofbiz.entity.transaction.GenericTransactionException Message: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) cause - Exception: javax.transaction.RollbackException Message: Transaction timeout stack trace --- javax.transaction.RollbackException: Transaction timeout org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269) org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245) org.ofbiz.service.job.JobManager.poll(JobManager.java:197) org.ofbiz.service.job.JobPoller.run(JobPoller.java:90) java.lang.Thread.run(Thread.java:619) I believe that the JobManager is not being able to handle all those jobs to schedule them, so nothing is being scheduled, which of course make the job list longer. Can anyone think of how to make the jobs run? All help much appreciated,
Re: JobManager failing to schedule jobs
Thanks, that is what I figured. First things first though: I need to get those jobs running somehow. Thanks for the help. On Wed, Jul 13, 2011 at 4:46 PM, BJ Freeman bjf...@free-man.net wrote: it means it will not purge job done so you will get a build up you can do a run service to start it again Josh Jacobson sent the following on 7/13/2011 4:41 PM: Thanks for the pointers. I'll take a look. There is one more piece of information: The purgeOldJobs service is in a crashed status. Do you think that is significant? Thanks, On Wed, Jul 13, 2011 at 4:32 PM, BJ Freeman bjf...@free-man.net wrote: You now know why I don't recommend cloud configuration for realtime operations, unless your running over dedicate lines not part of the internet. to summarize you environment caused the problem not ofbiz Now you have jobs cued that should have been run but have piled up. you need a way to get the job run so they don;t time out the system. I recommend you look at the purge old jobs service, copy and modify it to run your jobs, maybe by time group. Josh Jacobson sent the following on 7/13/2011 12:48 PM: Currently I am running: Red Hat Enterprise Linux Server release 5.5 6 CPUs, 16384MB RAM It was very recently upgraded from 2 CPUs and 8GB of RAM because we were having performance issues (lots of swap memory being used). It's on one of those cloud servers. Now it's running without using any swap. On Wed, Jul 13, 2011 at 12:22 PM, BJ Freeman bjf...@free-man.net wrote: Ok so you have the latest code. what is the eviorment you working with. OS Memory CPU speed Josh Jacobson sent the following on 7/13/2011 12:12 PM: BJ, I am running 10.04. On Wed, Jul 13, 2011 at 12:00 PM, BJ Freeman bjf...@free-man.net wrote: the key is Transaction timeout this could be the job length could be the database connection please specify the version of ofbiz since earlier transaction problems were taken care of by changing code that deals with transactions. Josh Jacobson sent the following on 7/13/2011 11:48 AM: Hello Everyone, I have an ofbiz instance in production where none of the jobs are being performed. I have about 160K jobs in pending status, but they are never being schedule. I can see the following in the log: 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [ JobManager.java:201:ERROR] exception report -- Transaction error trying to commit when polling and updating the JobSandbox: org.ofbiz.entity.transaction.GenericTransactionException: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) Exception: org.ofbiz.entity.transaction.GenericTransactionException Message: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) cause - Exception: javax.transaction.RollbackException Message: Transaction timeout stack trace --- javax.transaction.RollbackException: Transaction timeout org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269) org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245) org.ofbiz.service.job.JobManager.poll(JobManager.java:197) org.ofbiz.service.job.JobPoller.run(JobPoller.java:90) java.lang.Thread.run(Thread.java:619) I believe that the JobManager is not being able to handle all those jobs to schedule them, so nothing is being scheduled, which of course make the job list longer. Can anyone think of how to make the jobs run? All help much appreciated,
Re: JobManager failing to schedule jobs
Brett, Before I start trying to run the jobs manually, I want to give your suggestion a try. I think I know where to configure the job polling transaction time (I believe it's the poll-db-millis=2 value on the framework/service/config/serviceengine.xml. However, I still don't know what to increase it to. I understand that we wouldn't want to make it bigger than the default polling interval. Do you know what the default interval between polling is? Thanks, On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com wrote: I meant removing finished jobs. If you have thousands of pending jobs then you will have the same problem I mentioned in my first email. One resolution will be to increase the job poller transaction time. In the ofbiz version I was using there was not a way to configure the poller transaction time. It just used the default time. I had to create a patch to allow this to happen. In the patch you had to be careful to not increase the transaction time greater than the frequency of the job poller. Otherwise you get into a lock situation where one job poller is still running within a transaction and another poller starts. This didn't create a huge problem but the second job poller would usually lock and then time out. Brett On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Brett, Can you please explain what you mean by archiving the current JobSandbox first? Do you mean somehow removing the current pending jobs, applying you patch and the copying them back again? Thanks, On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com wrote: Josh, I've also seen this problem if the JobSandbox table has too many rows to process. I ran into a similar problem when I tried to run 10,000 Async batch processes. The time it took for the JobPoller to process all the records was too long and the transaction would time out. I had a patch to change the transaction timeout for the JobPoller specifically as it wasn't available in ofbiz at the time, but I don't think I ever submitted it. I could look for this patch if anyone is interested but it may already be implemented in the framework. I would try archiving jobs from the JobSandbox first. Brett On Wed, Jul 13, 2011 at 12:48 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Hello Everyone, I have an ofbiz instance in production where none of the jobs are being performed. I have about 160K jobs in pending status, but they are never being schedule. I can see the following in the log: 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [ JobManager.java:201:ERROR] exception report -- Transaction error trying to commit when polling and updating the JobSandbox: org.ofbiz.entity.transaction.GenericTransactionException: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) Exception: org.ofbiz.entity.transaction.GenericTransactionException Message: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) cause - Exception: javax.transaction.RollbackException Message: Transaction timeout stack trace --- javax.transaction.RollbackException: Transaction timeout org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269) org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245) org.ofbiz.service.job.JobManager.poll(JobManager.java:197) org.ofbiz.service.job.JobPoller.run(JobPoller.java:90) java.lang.Thread.run(Thread.java:619) I believe that the JobManager is not being able to handle all those jobs to schedule them, so nothing is being scheduled, which of course make the job list longer. Can anyone think of how to make the jobs run? All help much appreciated, -- Josh.
Re: JobManager failing to schedule jobs
That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: JobManager.java (line 148): beganTransaction = TransactionUtil.begin(); needs to be changed to use TransactionUtil.begin(int) Regards Scott HotWax Media http://www.hotwaxmedia.com On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: Brett, Before I start trying to run the jobs manually, I want to give your suggestion a try. I think I know where to configure the job polling transaction time (I believe it's the poll-db-millis=2 value on the framework/service/config/serviceengine.xml. However, I still don't know what to increase it to. I understand that we wouldn't want to make it bigger than the default polling interval. Do you know what the default interval between polling is? Thanks, On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com wrote: I meant removing finished jobs. If you have thousands of pending jobs then you will have the same problem I mentioned in my first email. One resolution will be to increase the job poller transaction time. In the ofbiz version I was using there was not a way to configure the poller transaction time. It just used the default time. I had to create a patch to allow this to happen. In the patch you had to be careful to not increase the transaction time greater than the frequency of the job poller. Otherwise you get into a lock situation where one job poller is still running within a transaction and another poller starts. This didn't create a huge problem but the second job poller would usually lock and then time out. Brett On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Brett, Can you please explain what you mean by archiving the current JobSandbox first? Do you mean somehow removing the current pending jobs, applying you patch and the copying them back again? Thanks, On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com wrote: Josh, I've also seen this problem if the JobSandbox table has too many rows to process. I ran into a similar problem when I tried to run 10,000 Async batch processes. The time it took for the JobPoller to process all the records was too long and the transaction would time out. I had a patch to change the transaction timeout for the JobPoller specifically as it wasn't available in ofbiz at the time, but I don't think I ever submitted it. I could look for this patch if anyone is interested but it may already be implemented in the framework. I would try archiving jobs from the JobSandbox first. Brett On Wed, Jul 13, 2011 at 12:48 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Hello Everyone, I have an ofbiz instance in production where none of the jobs are being performed. I have about 160K jobs in pending status, but they are never being schedule. I can see the following in the log: 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [ JobManager.java:201:ERROR] exception report -- Transaction error trying to commit when polling and updating the JobSandbox: org.ofbiz.entity.transaction.GenericTransactionException: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) Exception: org.ofbiz.entity.transaction.GenericTransactionException Message: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) cause - Exception: javax.transaction.RollbackException Message: Transaction timeout stack trace --- javax.transaction.RollbackException: Transaction timeout org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269) org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245) org.ofbiz.service.job.JobManager.poll(JobManager.java:197) org.ofbiz.service.job.JobPoller.run(JobPoller.java:90) java.lang.Thread.run(Thread.java:619) I believe that the JobManager is not being able to handle all those jobs to schedule them, so nothing is being scheduled, which of course make the job list longer. Can anyone think of how to make the jobs run? All help much appreciated, -- Josh.
Re: JobManager failing to schedule jobs
Scott, Thanks! That is very precise advise. Do you have a suggestion on interval time? 60 seconds? 120? Thanks, On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: JobManager.java (line 148): beganTransaction = TransactionUtil.begin(); needs to be changed to use TransactionUtil.begin(int) Regards Scott HotWax Media http://www.hotwaxmedia.com On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: Brett, Before I start trying to run the jobs manually, I want to give your suggestion a try. I think I know where to configure the job polling transaction time (I believe it's the poll-db-millis=2 value on the framework/service/config/serviceengine.xml. However, I still don't know what to increase it to. I understand that we wouldn't want to make it bigger than the default polling interval. Do you know what the default interval between polling is? Thanks, On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com wrote: I meant removing finished jobs. If you have thousands of pending jobs then you will have the same problem I mentioned in my first email. One resolution will be to increase the job poller transaction time. In the ofbiz version I was using there was not a way to configure the poller transaction time. It just used the default time. I had to create a patch to allow this to happen. In the patch you had to be careful to not increase the transaction time greater than the frequency of the job poller. Otherwise you get into a lock situation where one job poller is still running within a transaction and another poller starts. This didn't create a huge problem but the second job poller would usually lock and then time out. Brett On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Brett, Can you please explain what you mean by archiving the current JobSandbox first? Do you mean somehow removing the current pending jobs, applying you patch and the copying them back again? Thanks, On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com wrote: Josh, I've also seen this problem if the JobSandbox table has too many rows to process. I ran into a similar problem when I tried to run 10,000 Async batch processes. The time it took for the JobPoller to process all the records was too long and the transaction would time out. I had a patch to change the transaction timeout for the JobPoller specifically as it wasn't available in ofbiz at the time, but I don't think I ever submitted it. I could look for this patch if anyone is interested but it may already be implemented in the framework. I would try archiving jobs from the JobSandbox first. Brett On Wed, Jul 13, 2011 at 12:48 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Hello Everyone, I have an ofbiz instance in production where none of the jobs are being performed. I have about 160K jobs in pending status, but they are never being schedule. I can see the following in the log: 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [ JobManager.java:201:ERROR] exception report -- Transaction error trying to commit when polling and updating the JobSandbox: org.ofbiz.entity.transaction.GenericTransactionException: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) Exception: org.ofbiz.entity.transaction.GenericTransactionException Message: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) cause - Exception: javax.transaction.RollbackException Message: Transaction timeout stack trace --- javax.transaction.RollbackException: Transaction timeout org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:269) org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:245) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:259) org.ofbiz.entity.transaction.TransactionUtil.commit(TransactionUtil.java:245) org.ofbiz.service.job.JobManager.poll(JobManager.java:197) org.ofbiz.service.job.JobPoller.run(JobPoller.java:90) java.lang.Thread.run(Thread.java:619) I believe that the JobManager is not being able to handle all those jobs to schedule them, so
Re: JobManager failing to schedule jobs
As best I can tell there shouldn't be any need to increase the interval between polls since the interval timer doesn't actually start until the previous poll has completed (see JobPoller.run()) so I can't see how a small interval would cause any backlog problems. I'm guessing if there is any lock contention then it's probably caused by the executing jobs trying to update their respective rows while the poller is holding a table lock. So from that point of view I guess increasing the interval could reduce the amount of contention between the executing jobs and the next poll. Regards Scott On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: Scott, Thanks! That is very precise advise. Do you have a suggestion on interval time? 60 seconds? 120? Thanks, On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: JobManager.java (line 148): beganTransaction = TransactionUtil.begin(); needs to be changed to use TransactionUtil.begin(int) Regards Scott HotWax Media http://www.hotwaxmedia.com On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: Brett, Before I start trying to run the jobs manually, I want to give your suggestion a try. I think I know where to configure the job polling transaction time (I believe it's the poll-db-millis=2 value on the framework/service/config/serviceengine.xml. However, I still don't know what to increase it to. I understand that we wouldn't want to make it bigger than the default polling interval. Do you know what the default interval between polling is? Thanks, On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com wrote: I meant removing finished jobs. If you have thousands of pending jobs then you will have the same problem I mentioned in my first email. One resolution will be to increase the job poller transaction time. In the ofbiz version I was using there was not a way to configure the poller transaction time. It just used the default time. I had to create a patch to allow this to happen. In the patch you had to be careful to not increase the transaction time greater than the frequency of the job poller. Otherwise you get into a lock situation where one job poller is still running within a transaction and another poller starts. This didn't create a huge problem but the second job poller would usually lock and then time out. Brett On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Brett, Can you please explain what you mean by archiving the current JobSandbox first? Do you mean somehow removing the current pending jobs, applying you patch and the copying them back again? Thanks, On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com wrote: Josh, I've also seen this problem if the JobSandbox table has too many rows to process. I ran into a similar problem when I tried to run 10,000 Async batch processes. The time it took for the JobPoller to process all the records was too long and the transaction would time out. I had a patch to change the transaction timeout for the JobPoller specifically as it wasn't available in ofbiz at the time, but I don't think I ever submitted it. I could look for this patch if anyone is interested but it may already be implemented in the framework. I would try archiving jobs from the JobSandbox first. Brett On Wed, Jul 13, 2011 at 12:48 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Hello Everyone, I have an ofbiz instance in production where none of the jobs are being performed. I have about 160K jobs in pending status, but they are never being schedule. I can see the following in the log: 2011-07-13 13:32:01,959 (org.ofbiz.service.job.JobPoller@2599930b) [ JobManager.java:201:ERROR] exception report -- Transaction error trying to commit when polling and updating the JobSandbox: org.ofbiz.entity.transaction.GenericTransactionException: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) Exception: org.ofbiz.entity.transaction.GenericTransactionException Message: Roll back error (with no rollbackOnly cause found), could not commit transaction, was rolled back instead: javax.transaction.RollbackException: Transaction timeout (Transaction timeout) cause - Exception: javax.transaction.RollbackException Message: Transaction timeout stack trace --- javax.transaction.RollbackException: Transaction
Re: JobManager failing to schedule jobs
Thanks again. I actually meant a suggestion for the transaction timeout. In any case I am grateful for your explanation. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: As best I can tell there shouldn't be any need to increase the interval between polls since the interval timer doesn't actually start until the previous poll has completed (see JobPoller.run()) so I can't see how a small interval would cause any backlog problems. I'm guessing if there is any lock contention then it's probably caused by the executing jobs trying to update their respective rows while the poller is holding a table lock. So from that point of view I guess increasing the interval could reduce the amount of contention between the executing jobs and the next poll. Regards Scott On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: Scott, Thanks! That is very precise advise. Do you have a suggestion on interval time? 60 seconds? 120? Thanks, On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: JobManager.java (line 148): beganTransaction = TransactionUtil.begin(); needs to be changed to use TransactionUtil.begin(int) Regards Scott HotWax Media http://www.hotwaxmedia.com On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: Brett, Before I start trying to run the jobs manually, I want to give your suggestion a try. I think I know where to configure the job polling transaction time (I believe it's the poll-db-millis=2 value on the framework/service/config/serviceengine.xml. However, I still don't know what to increase it to. I understand that we wouldn't want to make it bigger than the default polling interval. Do you know what the default interval between polling is? Thanks, On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com wrote: I meant removing finished jobs. If you have thousands of pending jobs then you will have the same problem I mentioned in my first email. One resolution will be to increase the job poller transaction time. In the ofbiz version I was using there was not a way to configure the poller transaction time. It just used the default time. I had to create a patch to allow this to happen. In the patch you had to be careful to not increase the transaction time greater than the frequency of the job poller. Otherwise you get into a lock situation where one job poller is still running within a transaction and another poller starts. This didn't create a huge problem but the second job poller would usually lock and then time out. Brett On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Brett, Can you please explain what you mean by archiving the current JobSandbox first? Do you mean somehow removing the current pending jobs, applying you patch and the copying them back again? Thanks, On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com wrote: Josh, I've also seen this problem if the JobSandbox table has too many rows to process. I ran into a similar problem when I tried to run 10,000 Async batch processes. The time it took for the JobPoller to process all the records was too long and the transaction would time out. I had a patch to change the transaction timeout for the JobPoller specifically as it wasn't available in ofbiz at the time, but I don't think I ever submitted it. I could look for this patch if anyone is interested but it may already be implemented in the framework. I
Re: JobManager failing to schedule jobs
Ah okay, that is entirely dependent on the number of jobs and the speed the server can process them. As a side note I would keep a close eye on the purgeOldJobs service, when it starts falling over (transaction timeout again) then the number of rows in the table will increase quickly which in turn will slow down polling. In general the whole persisted jobs implementation is a bit fragile, especially when dealing with a large number of jobs. I've wanted to replace it with something like quartz for a while but haven't had the time. Regards Scott On 14/07/2011, at 2:10 PM, Josh Jacobson wrote: Thanks again. I actually meant a suggestion for the transaction timeout. In any case I am grateful for your explanation. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: As best I can tell there shouldn't be any need to increase the interval between polls since the interval timer doesn't actually start until the previous poll has completed (see JobPoller.run()) so I can't see how a small interval would cause any backlog problems. I'm guessing if there is any lock contention then it's probably caused by the executing jobs trying to update their respective rows while the poller is holding a table lock. So from that point of view I guess increasing the interval could reduce the amount of contention between the executing jobs and the next poll. Regards Scott On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: Scott, Thanks! That is very precise advise. Do you have a suggestion on interval time? 60 seconds? 120? Thanks, On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: JobManager.java (line 148): beganTransaction = TransactionUtil.begin(); needs to be changed to use TransactionUtil.begin(int) Regards Scott HotWax Media http://www.hotwaxmedia.com On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: Brett, Before I start trying to run the jobs manually, I want to give your suggestion a try. I think I know where to configure the job polling transaction time (I believe it's the poll-db-millis=2 value on the framework/service/config/serviceengine.xml. However, I still don't know what to increase it to. I understand that we wouldn't want to make it bigger than the default polling interval. Do you know what the default interval between polling is? Thanks, On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com wrote: I meant removing finished jobs. If you have thousands of pending jobs then you will have the same problem I mentioned in my first email. One resolution will be to increase the job poller transaction time. In the ofbiz version I was using there was not a way to configure the poller transaction time. It just used the default time. I had to create a patch to allow this to happen. In the patch you had to be careful to not increase the transaction time greater than the frequency of the job poller. Otherwise you get into a lock situation where one job poller is still running within a transaction and another poller starts. This didn't create a huge problem but the second job poller would usually lock and then time out. Brett On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Brett, Can you please explain what you mean by archiving the current JobSandbox first? Do you mean somehow removing the current pending jobs, applying you patch and the copying them back again? Thanks, On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com wrote: Josh, I've also seen this problem if the JobSandbox table has too many rows to process. I ran into a similar problem when I tried to run 10,000 Async batch processes. The time it took for the JobPoller to process all the records was too long and the transaction would time out. I had a patch to change the transaction timeout for the JobPoller specifically as it wasn't available in ofbiz at the time, but I don't think I ever submitted it. I could look for this patch if anyone is interested but it may already be implemented in the framework. I smime.p7s Description: S/MIME cryptographic signature
Re: JobManager failing to schedule jobs
Josh, I'm attaching the patch I used to work around this issue. This is based on an older version of ofbiz so I would compare your current files carefully. The following files were patched: service-config.xsd serviceengine.xml JobManager.java JobPoller.java The patch allowed for a new configuration option poll-transaction-timeout=300 I'm pretty sure that I was using 300 seconds for the poll-transaction-timeout. I believe the default is 60 or 120 seconds. I originally created a JIRA issue 3855 for this problem. https://issues.apache.org/jira/browse/OFBIZ-3855 If you set the transaction time out too high when the poller wakes up to process new requests it will timeout because the first poller has a lock on the table (or ofbiz semaphore method). Here are a couple of other options you could try since the number of pending jobs is so high. 1. Create a temporary status for the jobSandbox statusId and assign a large set of pending transactions to this status. Then only process a few 1000 at a time. Then you can incrementally change these back to pending so the service engine can process them in reasonable batches. I haven't tried this option but it would allow you to work with the service engine without modifying any code. 2. Start up several more instances of ofbiz all pointing to the same database. Each will start service process to process more requests in parallel. This probably won't work with out the patch I've attached as each service process would still time out and not allow other processes to start. Good luck, Brett On Wed, Jul 13, 2011 at 8:10 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Thanks again. I actually meant a suggestion for the transaction timeout. In any case I am grateful for your explanation. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: As best I can tell there shouldn't be any need to increase the interval between polls since the interval timer doesn't actually start until the previous poll has completed (see JobPoller.run()) so I can't see how a small interval would cause any backlog problems. I'm guessing if there is any lock contention then it's probably caused by the executing jobs trying to update their respective rows while the poller is holding a table lock. So from that point of view I guess increasing the interval could reduce the amount of contention between the executing jobs and the next poll. Regards Scott On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: Scott, Thanks! That is very precise advise. Do you have a suggestion on interval time? 60 seconds? 120? Thanks, On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: JobManager.java (line 148): beganTransaction = TransactionUtil.begin(); needs to be changed to use TransactionUtil.begin(int) Regards Scott HotWax Media http://www.hotwaxmedia.com On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: Brett, Before I start trying to run the jobs manually, I want to give your suggestion a try. I think I know where to configure the job polling transaction time (I believe it's the poll-db-millis=2 value on the framework/service/config/serviceengine.xml. However, I still don't know what to increase it to. I understand that we wouldn't want to make it bigger than the default polling interval. Do you know what the default interval between polling is? Thanks, On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com wrote: I meant removing finished jobs. If you have thousands of pending jobs then you will have the same problem I mentioned in my first email. One resolution will be to increase the job poller transaction time. In the ofbiz version I was using there was not a way to configure the poller transaction time. It just used the default time. I had to create a patch to allow this to happen. In the patch you had to be careful to not increase the transaction time greater than the frequency of the job poller. Otherwise you get into a lock situation where one job poller is still running within a transaction and another poller starts. This didn't create a huge problem but the second job poller would usually lock and then time out. Brett On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Brett, Can you please explain what you mean by archiving the current JobSandbox first? Do you mean somehow removing the current pending jobs, applying you patch and the copying them back again? Thanks, On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer brettgpal...@gmail.com wrote: Josh, I've also seen this problem if the JobSandbox table has too many rows
Re: JobManager failing to schedule jobs
I tried 60 seconds for timeout but that didn't work. I guess Ill double it now and keep trying. I have about 260,000 pending jobs, and nothing is getting done. I know what you mean about purgeOldjobs. That service is crashed now and I deleted old jobs from the database by hand. I was up to 2.6 million rows. Ofbiz was pretty much unusable. If you have any other suggestions I'd love Yo hear them. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: Ah okay, that is entirely dependent on the number of jobs and the speed the server can process them. As a side note I would keep a close eye on the purgeOldJobs service, when it starts falling over (transaction timeout again) then the number of rows in the table will increase quickly which in turn will slow down polling. In general the whole persisted jobs implementation is a bit fragile, especially when dealing with a large number of jobs. I've wanted to replace it with something like quartz for a while but haven't had the time. Regards Scott On 14/07/2011, at 2:10 PM, Josh Jacobson wrote: Thanks again. I actually meant a suggestion for the transaction timeout. In any case I am grateful for your explanation. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: As best I can tell there shouldn't be any need to increase the interval between polls since the interval timer doesn't actually start until the previous poll has completed (see JobPoller.run()) so I can't see how a small interval would cause any backlog problems. I'm guessing if there is any lock contention then it's probably caused by the executing jobs trying to update their respective rows while the poller is holding a table lock. So from that point of view I guess increasing the interval could reduce the amount of contention between the executing jobs and the next poll. Regards Scott On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: Scott, Thanks! That is very precise advise. Do you have a suggestion on interval time? 60 seconds? 120? Thanks, On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: JobManager.java (line 148): beganTransaction = TransactionUtil.begin(); needs to be changed to use TransactionUtil.begin(int) Regards Scott HotWax Media http://www.hotwaxmedia.com On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: Brett, Before I start trying to run the jobs manually, I want to give your suggestion a try. I think I know where to configure the job polling transaction time (I believe it's the poll-db-millis=2 value on the framework/service/config/serviceengine.xml. However, I still don't know what to increase it to. I understand that we wouldn't want to make it bigger than the default polling interval. Do you know what the default interval between polling is? Thanks, On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com wrote: I meant removing finished jobs. If you have thousands of pending jobs then you will have the same problem I mentioned in my first email. One resolution will be to increase the job poller transaction time. In the ofbiz version I was using there was not a way to configure the poller transaction time. It just used the default time. I had to create a patch to allow this to happen. In the patch you had to be careful to not increase the transaction time greater than the frequency of the job poller. Otherwise you get into a lock situation where one job poller is still running within a transaction and another poller starts. This didn't create a huge problem but the second job poller would usually lock and then time out. Brett On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Brett,
Re: JobManager failing to schedule jobs
Not sure what db you're using but it probably wouldn't hurt to run a vacuum on the table to speed up processing. By the way, I'm pretty sure the default timeout is 60 seconds so you might want to try something a little larger :-) Regards Scott On 14/07/2011, at 2:58 PM, Josh Jacobson wrote: I tried 60 seconds for timeout but that didn't work. I guess Ill double it now and keep trying. I have about 260,000 pending jobs, and nothing is getting done. I know what you mean about purgeOldjobs. That service is crashed now and I deleted old jobs from the database by hand. I was up to 2.6 million rows. Ofbiz was pretty much unusable. If you have any other suggestions I'd love Yo hear them. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: Ah okay, that is entirely dependent on the number of jobs and the speed the server can process them. As a side note I would keep a close eye on the purgeOldJobs service, when it starts falling over (transaction timeout again) then the number of rows in the table will increase quickly which in turn will slow down polling. In general the whole persisted jobs implementation is a bit fragile, especially when dealing with a large number of jobs. I've wanted to replace it with something like quartz for a while but haven't had the time. Regards Scott On 14/07/2011, at 2:10 PM, Josh Jacobson wrote: Thanks again. I actually meant a suggestion for the transaction timeout. In any case I am grateful for your explanation. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: As best I can tell there shouldn't be any need to increase the interval between polls since the interval timer doesn't actually start until the previous poll has completed (see JobPoller.run()) so I can't see how a small interval would cause any backlog problems. I'm guessing if there is any lock contention then it's probably caused by the executing jobs trying to update their respective rows while the poller is holding a table lock. So from that point of view I guess increasing the interval could reduce the amount of contention between the executing jobs and the next poll. Regards Scott On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: Scott, Thanks! That is very precise advise. Do you have a suggestion on interval time? 60 seconds? 120? Thanks, On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: JobManager.java (line 148): beganTransaction = TransactionUtil.begin(); needs to be changed to use TransactionUtil.begin(int) Regards Scott HotWax Media http://www.hotwaxmedia.com On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: Brett, Before I start trying to run the jobs manually, I want to give your suggestion a try. I think I know where to configure the job polling transaction time (I believe it's the poll-db-millis=2 value on the framework/service/config/serviceengine.xml. However, I still don't know what to increase it to. I understand that we wouldn't want to make it bigger than the default polling interval. Do you know what the default interval between polling is? Thanks, On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com wrote: I meant removing finished jobs. If you have thousands of pending jobs then you will have the same problem I mentioned in my first email. One resolution will be to increase the job poller transaction time. In the ofbiz version I was using there was not a way to configure the poller transaction time. It just used the default time. I had to create a patch to allow this to happen. In the patch you had to be careful to not increase the transaction time greater than the frequency of the job poller. Otherwise you get into a lock situation where one job poller is still running within a transaction and another poller starts. This didn't create a huge problem but the second job poller would usually lock and then time out. Brett On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Brett, smime.p7s Description: S/MIME cryptographic signature
Re: JobManager failing to schedule jobs
Vacuum has been run, (took quite a while). Yeah, I see now that the JobManager actually tries to update all the JobSandbox rows in the transaction, so 60 seconds was pretty low. I am trying 10 minutes now and see how that goes. I am using postgress by the way. Thanks for the help, I really appreciate it. -- Josh. On Wed, Jul 13, 2011 at 8:29 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: Not sure what db you're using but it probably wouldn't hurt to run a vacuum on the table to speed up processing. By the way, I'm pretty sure the default timeout is 60 seconds so you might want to try something a little larger :-) Regards Scott On 14/07/2011, at 2:58 PM, Josh Jacobson wrote: I tried 60 seconds for timeout but that didn't work. I guess Ill double it now and keep trying. I have about 260,000 pending jobs, and nothing is getting done. I know what you mean about purgeOldjobs. That service is crashed now and I deleted old jobs from the database by hand. I was up to 2.6 million rows. Ofbiz was pretty much unusable. If you have any other suggestions I'd love Yo hear them. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: Ah okay, that is entirely dependent on the number of jobs and the speed the server can process them. As a side note I would keep a close eye on the purgeOldJobs service, when it starts falling over (transaction timeout again) then the number of rows in the table will increase quickly which in turn will slow down polling. In general the whole persisted jobs implementation is a bit fragile, especially when dealing with a large number of jobs. I've wanted to replace it with something like quartz for a while but haven't had the time. Regards Scott On 14/07/2011, at 2:10 PM, Josh Jacobson wrote: Thanks again. I actually meant a suggestion for the transaction timeout. In any case I am grateful for your explanation. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: As best I can tell there shouldn't be any need to increase the interval between polls since the interval timer doesn't actually start until the previous poll has completed (see JobPoller.run()) so I can't see how a small interval would cause any backlog problems. I'm guessing if there is any lock contention then it's probably caused by the executing jobs trying to update their respective rows while the poller is holding a table lock. So from that point of view I guess increasing the interval could reduce the amount of contention between the executing jobs and the next poll. Regards Scott On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: Scott, Thanks! That is very precise advise. Do you have a suggestion on interval time? 60 seconds? 120? Thanks, On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: JobManager.java (line 148): beganTransaction = TransactionUtil.begin(); needs to be changed to use TransactionUtil.begin(int) Regards Scott HotWax Media http://www.hotwaxmedia.com On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: Brett, Before I start trying to run the jobs manually, I want to give your suggestion a try. I think I know where to configure the job polling transaction time (I believe it's the poll-db-millis=2 value on the framework/service/config/serviceengine.xml. However, I still don't know what to increase it to. I understand that we wouldn't want to make it bigger than the default polling interval. Do you know what the default interval between polling is? Thanks, On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com wrote: I meant removing finished jobs. If you have thousands of pending jobs then you will have the same problem I mentioned in my first email. One resolution will be to increase the job poller transaction time. In the ofbiz version I was using there was not a way to configure the poller transaction time. It just used the default time. I had to create a patch to allow this to happen. In the patch you had to be careful to not increase the transaction time greater than the frequency of the job poller. Otherwise you get into a lock situation where one job poller is still running within a transaction and another poller starts. This didn't create a huge problem but the second job poller would usually lock and then time out. Brett On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Brett,
Re: JobManager failing to schedule jobs
you going to run into this from time to time or one reason or another. the approach I took was to spread the jobs out so they are not lumped together. take a look at how the jobs are Marshalled to be run. Josh Jacobson sent the following on 7/13/2011 8:35 PM: Vacuum has been run, (took quite a while). Yeah, I see now that the JobManager actually tries to update all the JobSandbox rows in the transaction, so 60 seconds was pretty low. I am trying 10 minutes now and see how that goes. I am using postgress by the way. Thanks for the help, I really appreciate it. -- Josh. On Wed, Jul 13, 2011 at 8:29 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: Not sure what db you're using but it probably wouldn't hurt to run a vacuum on the table to speed up processing. By the way, I'm pretty sure the default timeout is 60 seconds so you might want to try something a little larger :-) Regards Scott On 14/07/2011, at 2:58 PM, Josh Jacobson wrote: I tried 60 seconds for timeout but that didn't work. I guess Ill double it now and keep trying. I have about 260,000 pending jobs, and nothing is getting done. I know what you mean about purgeOldjobs. That service is crashed now and I deleted old jobs from the database by hand. I was up to 2.6 million rows. Ofbiz was pretty much unusable. If you have any other suggestions I'd love Yo hear them. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: Ah okay, that is entirely dependent on the number of jobs and the speed the server can process them. As a side note I would keep a close eye on the purgeOldJobs service, when it starts falling over (transaction timeout again) then the number of rows in the table will increase quickly which in turn will slow down polling. In general the whole persisted jobs implementation is a bit fragile, especially when dealing with a large number of jobs. I've wanted to replace it with something like quartz for a while but haven't had the time. Regards Scott On 14/07/2011, at 2:10 PM, Josh Jacobson wrote: Thanks again. I actually meant a suggestion for the transaction timeout. In any case I am grateful for your explanation. On Wednesday, July 13, 2011, Scott Gray scott.g...@hotwaxmedia.com wrote: As best I can tell there shouldn't be any need to increase the interval between polls since the interval timer doesn't actually start until the previous poll has completed (see JobPoller.run()) so I can't see how a small interval would cause any backlog problems. I'm guessing if there is any lock contention then it's probably caused by the executing jobs trying to update their respective rows while the poller is holding a table lock. So from that point of view I guess increasing the interval could reduce the amount of contention between the executing jobs and the next poll. Regards Scott On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: Scott, Thanks! That is very precise advise. Do you have a suggestion on interval time? 60 seconds? 120? Thanks, On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray scott.g...@hotwaxmedia.com wrote: That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: JobManager.java (line 148): beganTransaction = TransactionUtil.begin(); needs to be changed to use TransactionUtil.begin(int) Regards Scott HotWax Media http://www.hotwaxmedia.com On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: Brett, Before I start trying to run the jobs manually, I want to give your suggestion a try. I think I know where to configure the job polling transaction time (I believe it's the poll-db-millis=2 value on the framework/service/config/serviceengine.xml. However, I still don't know what to increase it to. I understand that we wouldn't want to make it bigger than the default polling interval. Do you know what the default interval between polling is? Thanks, On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer brettgpal...@gmail.com wrote: I meant removing finished jobs. If you have thousands of pending jobs then you will have the same problem I mentioned in my first email. One resolution will be to increase the job poller transaction time. In the ofbiz version I was using there was not a way to configure the poller transaction time. It just used the default time. I had to create a patch to allow this to happen. In the patch you had to be careful to not increase the transaction time greater than the frequency of the job poller. Otherwise you get into a lock situation where one job poller is still running within a transaction and another poller starts. This didn't create a huge problem but the second job poller would usually lock and then time out. Brett On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson josh.s.jacob...@gmail.comwrote: Brett,