Adrian Crum wrote: > I ran my patch against your recent changes and the errors went away. I > guess we can consider that issue resolved.
Yeah, I did do some changes to SequenceUtil a while back. The biggest functional change was to remove some variables from the inner class to the outer, and not try to access them all the time. > As far as the approach I took to multi-threading the data load - here is > an overview: > > I was able to run certain tasks in parallel - creating entities and > creating primary keys, for example. I have the number of threads > allocated configured in a properties file. By tweaking that number I was > able to increase CPU utilization and reduce the creation time. Of course > there was a threshold where CPU utilization was raised and creation time > decreased - due to thread thrash. So each entity creation itself was a separate work unit. Once an entity was created, you could submit the primary key creation as well. That's simple enough to implement(in theory, anyways). This design is starting to go towards the Sandstorm(1) approach. There are ways to find out how many cpus are available. Look at org.ofbiz.base.concurrent.ExecutionPool.getNewOptimalExecutor(); it calls into ManagementFactory. > Creating foreign keys must be run on a single thread to prevent database > deadlocks. Maybe. If the entity and primary keys are all created for both sides of the foreign key, then shouldn't it be possible to submit the work unit to the pool? > I multi-threaded the data load by having one thread parse the XML files > and put the results in a queue. Another thread services the queue and > loads the data. I also multi-threaded the EECAs - but that has an issue > I need to solve. Hmm. You dug deeper, splitting up the points into separate calls. I hadn't done that yet, and just dumped each xml file to a separate thread. My approach is obviously wrong. > My original goal was to reduce the ant clean-all + ant run-install cycle > time. I recently purchased a much faster development machine that > completes the cycle in about 2 minutes - slightly longer than the > multi-threaded code, so I don't have much of an incentive to develop the > patch further. I've reduced the time it takes to do a run-tests loop. The changes I've done to log4j.xml reduces the *extreme* debug logging produced by several classes. log4j would create a new exception, so that it could get the correct class and line number to print to the log. This is a heavy-weight operation. This mostly showed up as slowness when catalina would start up, so this set of changes doesn't directly affect the run-install cycle. > The whole experience was an educational one. There is a possibility the > techniques I developed could be used to speed up import/export of large > datasets. If anyone is interested in that, I am available for hire. We have a site, where users could upload original images(6), then fill out a bunch of form data, then some pdfs would be generated. I would submit a bunch of image resize operations(had to make 2 reduced-size images for each of the originals). All of those are able to run in parallel. Then, once all the images were done, the 2 pdfs would be submitted. This entire pipeline itself might be run in parallel too, as the user could have multiple such records that needed to be updated. 1: http://www.eecs.harvard.edu/~mdw/proj/seda/
