Adrian Crum wrote:
> --- On Sat, 4/3/10, Adam Heath <[email protected]> wrote:
>> Adrian Crum wrote:
>>> I multi-threaded the data load by having one thread
>> parse the XML files
>>> and put the results in a queue. Another thread
>> services the queue and
>>> loads the data. I also multi-threaded the EECAs - but
>> that has an issue
>>> I need to solve.
>> Well, there could be some EECAs that have dependencies on
>> each other,
>> when defined in a single definition file.  Or, they
>> have implicit
>> dependencies with other earlier defined ecas.  Like,
>> maybe an order
>> eca assuming that a product eca has run, just because ofbiz
>> has always
>> loaded the product component before the order component.
> 
> I used a FIFO queue serviced by a single thread for the EECAs - to preserve 
> the sequence. The main idea was to offload the EECA execution from the thread 
> that triggered the EECA. The data load was also in a FIFO queue serviced by a 
> single thread so the files were being loaded in order.
> 
> To summarize:
> 
> 1. Table creation is handled by a thread pool with an adjustable size. A 
> thread task is to create a table and its primary keys. Thread tasks run in 
> parallel. Main thread blocks until all tables and primary keys are created.
> 2. Main thread creates foreign keys.
> 3. Main thread parses XML files, puts results in data load queue.
> 4. A data load thread services the data load queue and stores the data. If an 
> ECA is triggered it puts the ECA info in an ECA queue.
> 5. An ECA thread services the ECA queue and runs the ECA.
> 6. Main thread blocks until all queues are empty.

Except if an eca fires, but the main data load thread keeps going,
then the main data load thread might insert/update something that
hasn't yet been manipulated by the eca(s).

Additionally, and eca can run a service, which can do anything,
including adding/updating/removing other values, which cause other
ecas to fire.  Which then interact with the queued-based eca.

Were your changes only active at startup, during the initial install,
or were they always available?  When data is later manipulated, during
a test run, certain guarantees still have to be met(which I'm sure you
know).

>> This is a difficult problem to solve; probably not worth
>> it.  During
>> production, different high-level threads, modifying
>> different
>> entities, will run faster, they are already running in
>> multiple threads.
>>
>> Most ecas(entity, and probably service) generally run
>> relatively fast.
>>    Trying to break that up and dispatch into
>> a thread pool might make
>> things slower, as you have cpu cache coherency effects to
>> content with.
>>
>> What would be better, is to break up the higher levels into
>> more
>> threads, during an install.  That could be made
>> semi-smart, if we add
>> file dependencies to the data xml files.  Such
>> explicit dependencies
>> will  have to be done by hand.  Then, a parallel
>> execution framework,
>> that ran each xml file in parallel, once all of it's
>> dependencies were
>> met, would give us a speedup.
> 
> The minor changes I made cut the data load time in half. That's not fast 
> enough? ;-)
> 
> It didn't take a lot of threads or a lot of thought to speed things up. The 
> bottom line is, you want to keep parts of the process going while waiting for 
> DB I/O.

As for run-install, it starts up catalina.  It'd be nice if that were
multi-threaded as well.  But catalina appears to be serial internally.

Reply via email to