Re: Job Manager

Brett Palmer Wed, 08 Aug 2012 18:36:27 -0700

Jacques,

It sounds like what Adrian implemented would solve a lot of our problems
with the service engine.  Please see my comments inline..


On Wed, Aug 8, 2012 at 3:54 PM, Jacques Le Roux <
[email protected]> wrote:

> Hi Brett,
>
> Interesting...
>
> Brett Palmer wrote:
>
>> *Jacques,*
>> *
>>
>> I had to review some of my notes to remember what we were trying to do
>> with
>> the JobSandbox.  Here are my replies to your questions:
>>
>> 1. Did you use the purge-job-days setting in serviceengine.xml and the
>> related  purgeOldJobs? If not was there a reason?
>>
>> We were not using the purgeOldJobs service.  This was probably because we
>> didn’t understand how the service worked.  We may have thought the service
>> was specific to order only jobs which would not have worked for us.  Our
>> jobs are custom service jobs for the particular application we are
>> developing.
>>
>
> I agree with Adrian, this can be perfected, using a smart dynamic way of
> purging old jobs during Job Poller idle periods
>
>
Yes, a smart poller would work and avoid conflicts during heavy transaction
times.


>
>  One problem that we had with most jobs that hit the JobSandbox (including
>> the poller) was that it appeared they were doing full table scans instead
>> of an indexed scan.  These would cause problems for us when the JobSandbox
>> grew larger and especially during heavy production days.  We would often
>> see transaction locks on the JobSandbox and I/O bottlenecks on the server
>> in general due to the scans.  The purgeOldJobs service may be a good
>> solution for that if we could keep the JobSandbox to a reasonable number
>> of
>> records.
>>
>> I created issue OFBIZ-3855 on this a couple of years ago when we tried to
>> use the JobSandbox as a batch process service for multiple application
>> servers.  We were filling up the JobSandbox with 100k of records over a
>> short period of time.  The poller was getting transaction timeouts before
>> it could change the status of the next available job to process.  I
>> created
>> a patch to allow a user to customize the transaction timeout for the
>> poller.  I thought I had submitted this patch but looking at the Jira
>> issue
>> it doesn’t look like it was every submitted.
>>
>
> I put a comment there. I browsed (I can't really say reviewed) Adrian's
> recent work, after Jacopo's, and it seems to me that it should address your
> problem. Or at least is a sound foundation for that...
>
>
>  In the end we changed how we did our data warehouse processing.
>>  Increasing
>> the transaction timeout didn’t really solve the problem either it just
>> made
>> it possible to extend the timeout length which can have other consequences
>> in the system.
>>
>> If the community is still interested in the patch I can submit it to Jira
>> for a recent version from the trunk.
>>
>>
>> 2. Configuring service engine to run with multiple job pools.
>>
>> As I’m looking at my notes I believe the problem with configuring the
>> service engine with multiple job pools was that there wasn’t an API to run
>> a service (async or synchronous) to a specific job service pool.  You
>> could
>> schedule a job to run against a particular pool.
>>
>> For example in the serviceengine.xml file you can configure a job to run
>> in
>> a particular job pool like the following:
>>
>>        <startup-service name="testScv" runtime-data-id="9900"
>> runtime-delay="0"
>> run-in-pool="pool"/>
>>
>> You can also use the LocalDispatcher.schedule() method to schedule a job
>> to
>> run in a particular pool.
>>
>> What we needed was a way to configure our app servers to service different
>> service pools but allow all app servers to request the service
>> dynamically.
>>
>
> I see, you want to have this dynamically done with an API, to better
> handle where the jobs are running, not statically as done by the
> thread-pool attribute.
>
>
>
At the time we were looking for a method with the service dispatcher to run
an async service and assign it to a particular pool.  For example,
 localDispatcher.Async("PoolName", other params...).  This was for our data
warehouse process that we wanted to be as close to real time as possible.
 For our application we would run multiple application servers talking to
the same database.  During heavy usage periods we could not have all app
servers servicing this asynchronous requests as it would be competing for
limited resources on our database.



>   This would allow us to limit the number of concurrent services that were
>> run in our system.
>>
>
> If I well understand what you mean by "concurrent services" (I guess you
> mean jobs), when I want to avoid running concurrent services, I put the
> semaphore service attribute to "fail". Since it uses the ServiceSemaphore
> it should span across all services manager and thread-pools which use the
> same DB. So far I did not cross issues with that, but maybe it can also be
> improved, notably to guarantee any collisions in DB using SELECT for UPDATE.
>
>
Yes, I mean jobs in the JobSandbox.  I was not aware of the semaphore
service attribute which would have helped.  We ended up implementing a
custom "SELECT for UPDATE" method with our servers with a semaphore table
to prevent more than one data warehouse process running on a single
application server.

We scheduled this service to run once every 5 mins using the normal ofbiz
scheduler.  The problem was during high loads the process would often not
complete and the service engine would start another service.  We used the
semaphore service to set a flag in a semaphore table to limit a single data
warehouse process per server.  Perhaps the semaphore service attribute
could have done the same thing.


>
>  The default system engine lets all the app servers
>> service the jobSandbox which doesn’t scale well for us during heavy
>> production days.
>>
>
> Not sure to understand, you mean that assigning services to thread-pools
> has no effects? I rather guess it was not sufficient from you explanation
> above.
>
>
>  This is one of the reasons we liked the idea of a JMS integration with the
>> service engine.  Then we could start up processes to listen to a specific
>> queues and our application could write to the different queues.  This
>> would
>> allow us to control the amount of concurrent services processed at a time.
>>
>
> There is already a JMS integration with the service engine. I use it for
> the DCC https://cwiki.apache.org/**confluence/display/OFBIZ/**
> Distributed+Entity+Cache+**Clear+Mechanism<https://cwiki.apache.org/confluence/display/OFBIZ/Distributed+Entity+Cache+Clear+Mechanism>
> You want something more flexible, like the "dynamic thread-pool API" you
> suggested, more integrated?
>
>
Great article on using JMS with ofbiz.  This is something we can use as we
do a lot of multi-server implementations with ofbiz.


Thanks for your help.  I'll take a look at the recent commits and post any
questions I have to the list.



Brett

Re: Job Manager

Reply via email to