Adam Heath wrote:
As Adrian and I previously discussed, he said he had discovered some
possible problems with SequenceUtil in multi-threaded situations.  He
discovered this when he made EntityDataLoadContainer load each xml
file in a thread.

I've recently done the same on my local copy, but I don't see any
problems.  What I did see, however, was that just throwing every xml
data file into a thread(actually, a 4-count thread pool), had errors
loading some files, because each file has an implicit dependency on
some possible other set of files, and those files hadn't been loaded yet.

So, before doing a thread load, the files would have to have an
explicit dependency listed, so that correct ordering could be done.
This is not something that would make ofbiz easier to use.

Trying to figure out the implicit dependencies automatically by
comparing each entity line isn't worthwhile, as that would be
reimplementing a database, and what would be the point.

So, Adrian, if you have any more pointers as to what your original
change did, I'd appreciate any insight you might have.  Otherwise, I
will say that we can't load data in parallel.

Additionally, I suspsected that SequenceUtil actually *didn't* have
any problems.  I wrote a test case quite a while back that did
multi-threaded testing of SequenceUtil, and it never had any problems.
 It used 100 threads, with each thread trying to allocate 1000
sequence values.

I ran my patch against your recent changes and the errors went away. I guess we can consider that issue resolved.

As far as the approach I took to multi-threading the data load - here is an overview:

I was able to run certain tasks in parallel - creating entities and creating primary keys, for example. I have the number of threads allocated configured in a properties file. By tweaking that number I was able to increase CPU utilization and reduce the creation time. Of course there was a threshold where CPU utilization was raised and creation time decreased - due to thread thrash.

Creating foreign keys must be run on a single thread to prevent database deadlocks.

I multi-threaded the data load by having one thread parse the XML files and put the results in a queue. Another thread services the queue and loads the data. I also multi-threaded the EECAs - but that has an issue I need to solve.

My original goal was to reduce the ant clean-all + ant run-install cycle time. I recently purchased a much faster development machine that completes the cycle in about 2 minutes - slightly longer than the multi-threaded code, so I don't have much of an incentive to develop the patch further.

The whole experience was an educational one. There is a possibility the techniques I developed could be used to speed up import/export of large datasets. If anyone is interested in that, I am available for hire.

-Adrian


Reply via email to