Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Adam Heath Thu, 01 Apr 2010 09:21:44 -0700

Adrian Crum wrote:
> I ran my patch against your recent changes and the errors went away. I
> guess we can consider that issue resolved.


Yeah, I did do some changes to SequenceUtil a while back.  The biggest
functional change was to remove some variables from the inner class to
the outer, and not try to access them all the time.

> As far as the approach I took to multi-threading the data load - here is
> an overview:
> 
> I was able to run certain tasks in parallel - creating entities and
> creating primary keys, for example. I have the number of threads
> allocated configured in a properties file. By tweaking that number I was
> able to increase CPU utilization and reduce the creation time. Of course
> there was a threshold where CPU utilization was raised and creation time
> decreased - due to thread thrash.

So each entity creation itself was a separate work unit.  Once an
entity was created, you could submit the primary key creation as well.
 That's simple enough to implement(in theory, anyways).  This design
is starting to go towards the Sandstorm(1) approach.

There are ways to find out how many cpus are available.  Look at
org.ofbiz.base.concurrent.ExecutionPool.getNewOptimalExecutor(); it
calls into ManagementFactory.

> Creating foreign keys must be run on a single thread to prevent database
> deadlocks.

Maybe.  If the entity and primary keys are all created for both sides
of the foreign key, then shouldn't it be possible to submit the work
unit to the pool?

> I multi-threaded the data load by having one thread parse the XML files
> and put the results in a queue. Another thread services the queue and
> loads the data. I also multi-threaded the EECAs - but that has an issue
> I need to solve.

Hmm.  You dug deeper, splitting up the points into separate calls.  I
hadn't done that yet, and just dumped each xml file to a separate
thread.  My approach is obviously wrong.

> My original goal was to reduce the ant clean-all + ant run-install cycle
> time. I recently purchased a much faster development machine that
> completes the cycle in about 2 minutes - slightly longer than the
> multi-threaded code, so I don't have much of an incentive to develop the
> patch further.

I've reduced the time it takes to do a run-tests loop.  The changes
I've done to log4j.xml reduces the *extreme* debug logging produced by
several classes.  log4j would create a new exception, so that it could
get the correct class and line number to print to the log.  This is a
heavy-weight operation.  This mostly showed up as slowness when
catalina would start up, so this set of changes doesn't directly
affect the run-install cycle.

> The whole experience was an educational one. There is a possibility the
> techniques I developed could be used to speed up import/export of large
> datasets. If anyone is interested in that, I am available for hire.

We have a site, where users could upload original images(6), then fill
out a bunch of form data, then some pdfs would be generated.  I would
submit a bunch of image resize operations(had to make 2 reduced-size
images for each of the originals).  All of those are able to run in
parallel.  Then, once all the images were done, the 2 pdfs would be
submitted.  This entire pipeline itself might be run in parallel too,
as the user could have multiple such records that needed to be updated.

1: http://www.eecs.harvard.edu/~mdw/proj/seda/

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Reply via email to