Adam Heath wrote:
Adrian Crum wrote:
Adam Heath wrote:
Adrian Crum wrote:
The java.util.concurrent package rocks! I used it a few weeks ago to
multi-thread the demo data loading code. I got it down from 3 minutes to
1.5 minutes.
What?  You made the ofbiz demo data loading code multi-threaded?
Seriously?  If so, that rocks!
I used a thread pool to create tables and non-fk indexes. By fine tuning
the thread count, I was able to take the single-threaded CPU usage from
12-20% up to 50-90%. I used a FIFO queue for loading data - the main
thread parses the XML files and places DOM Elements in the queue, and
another thread takes the elements from the queue and stores them in the
database.

Some day I'll clean up the code and provide a patch. It only benefits
multi-CPU computers.

I would do this in multiple stages.

First stage would be a generic xml parsing service.  Each xml file is
handed off to an ExecutorService.  The Callable.call() method would
then parse the file, and the return would be a Document.

The second phase would then use the same ExecutorService, and convert
each Document to a List<GenericValue>.  As an optimization, the first
phase would auto-submit the document back to the same executor.

Third phase would then import files in parallel, but not the separate
values.  You'd have to handle dependency issues, similiar to the
looping that is currently done.  However, the correct fix for these
kinds of problems would be to reorder the data in the files.

I'm sure all kinds of optimizations could be tried. Once you have the basic multi-threading working, tweaking it becomes addictive.

My stab at it served two purposes - I had just read the Sun tutorial on the concurrent package and I wanted to try it out, and I really needed to reduce the demo data load time because I was using the process to test the converter integration in the entity engine.

-Adrian

Reply via email to