Adrian Crum wrote: > --- On Sat, 4/3/10, Adam Heath <[email protected]> wrote: >> Adrian Crum wrote: >>> I multi-threaded the data load by having one thread >> parse the XML files >>> and put the results in a queue. Another thread >> services the queue and >>> loads the data. I also multi-threaded the EECAs - but >> that has an issue >>> I need to solve. >> Well, there could be some EECAs that have dependencies on >> each other, >> when defined in a single definition file. Or, they >> have implicit >> dependencies with other earlier defined ecas. Like, >> maybe an order >> eca assuming that a product eca has run, just because ofbiz >> has always >> loaded the product component before the order component. > > I used a FIFO queue serviced by a single thread for the EECAs - to preserve > the sequence. The main idea was to offload the EECA execution from the thread > that triggered the EECA. The data load was also in a FIFO queue serviced by a > single thread so the files were being loaded in order. > > To summarize: > > 1. Table creation is handled by a thread pool with an adjustable size. A > thread task is to create a table and its primary keys. Thread tasks run in > parallel. Main thread blocks until all tables and primary keys are created. > 2. Main thread creates foreign keys. > 3. Main thread parses XML files, puts results in data load queue. > 4. A data load thread services the data load queue and stores the data. If an > ECA is triggered it puts the ECA info in an ECA queue. > 5. An ECA thread services the ECA queue and runs the ECA. > 6. Main thread blocks until all queues are empty.
Except if an eca fires, but the main data load thread keeps going, then the main data load thread might insert/update something that hasn't yet been manipulated by the eca(s). Additionally, and eca can run a service, which can do anything, including adding/updating/removing other values, which cause other ecas to fire. Which then interact with the queued-based eca. Were your changes only active at startup, during the initial install, or were they always available? When data is later manipulated, during a test run, certain guarantees still have to be met(which I'm sure you know). >> This is a difficult problem to solve; probably not worth >> it. During >> production, different high-level threads, modifying >> different >> entities, will run faster, they are already running in >> multiple threads. >> >> Most ecas(entity, and probably service) generally run >> relatively fast. >> Trying to break that up and dispatch into >> a thread pool might make >> things slower, as you have cpu cache coherency effects to >> content with. >> >> What would be better, is to break up the higher levels into >> more >> threads, during an install. That could be made >> semi-smart, if we add >> file dependencies to the data xml files. Such >> explicit dependencies >> will have to be done by hand. Then, a parallel >> execution framework, >> that ran each xml file in parallel, once all of it's >> dependencies were >> met, would give us a speedup. > > The minor changes I made cut the data load time in half. That's not fast > enough? ;-) > > It didn't take a lot of threads or a lot of thought to speed things up. The > bottom line is, you want to keep parts of the process going while waiting for > DB I/O. As for run-install, it starts up catalina. It'd be nice if that were multi-threaded as well. But catalina appears to be serial internally.
