Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Adrian Crum Sat, 03 Apr 2010 14:23:23 -0700

--- On Sat, 4/3/10, Adam Heath <[email protected]> wrote:
> Adrian Crum wrote:
> > I multi-threaded the data load by having one thread
> parse the XML files
> > and put the results in a queue. Another thread
> services the queue and
> > loads the data. I also multi-threaded the EECAs - but
> that has an issue
> > I need to solve.
> 
> Well, there could be some EECAs that have dependencies on
> each other,
> when defined in a single definition file.  Or, they
> have implicit
> dependencies with other earlier defined ecas.  Like,
> maybe an order
> eca assuming that a product eca has run, just because ofbiz
> has always
> loaded the product component before the order component.


I used a FIFO queue serviced by a single thread for the EECAs - to preserve the 
sequence. The main idea was to offload the EECA execution from the thread that 
triggered the EECA. The data load was also in a FIFO queue serviced by a single 
thread so the files were being loaded in order.

To summarize:

1. Table creation is handled by a thread pool with an adjustable size. A thread 
task is to create a table and its primary keys. Thread tasks run in parallel. 
Main thread blocks until all tables and primary keys are created.
2. Main thread creates foreign keys.
3. Main thread parses XML files, puts results in data load queue.
4. A data load thread services the data load queue and stores the data. If an 
ECA is triggered it puts the ECA info in an ECA queue.
5. An ECA thread services the ECA queue and runs the ECA.
6. Main thread blocks until all queues are empty.

> This is a difficult problem to solve; probably not worth
> it.  During
> production, different high-level threads, modifying
> different
> entities, will run faster, they are already running in
> multiple threads.
> 
> Most ecas(entity, and probably service) generally run
> relatively fast.
>    Trying to break that up and dispatch into
> a thread pool might make
> things slower, as you have cpu cache coherency effects to
> content with.
> 
> What would be better, is to break up the higher levels into
> more
> threads, during an install.  That could be made
> semi-smart, if we add
> file dependencies to the data xml files.  Such
> explicit dependencies
> will  have to be done by hand.  Then, a parallel
> execution framework,
> that ran each xml file in parallel, once all of it's
> dependencies were
> met, would give us a speedup.

The minor changes I made cut the data load time in half. That's not fast 
enough? ;-)

It didn't take a lot of threads or a lot of thought to speed things up. The 
bottom line is, you want to keep parts of the process going while waiting for 
DB I/O.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Reply via email to