--- On Sat, 4/3/10, Adam Heath <[email protected]> wrote: > Adrian Crum wrote: > > --- On Sat, 4/3/10, Adam Heath <[email protected]> > wrote: > >> Adrian Crum wrote: > >>> I multi-threaded the data load by having one > thread > >> parse the XML files > >>> and put the results in a queue. Another > thread > >> services the queue and > >>> loads the data. I also multi-threaded the > EECAs - but > >> that has an issue > >>> I need to solve. > >> Well, there could be some EECAs that have > dependencies on > >> each other, > >> when defined in a single definition file. > Or, they > >> have implicit > >> dependencies with other earlier defined > ecas. Like, > >> maybe an order > >> eca assuming that a product eca has run, just > because ofbiz > >> has always > >> loaded the product component before the order > component. > > > > I used a FIFO queue serviced by a single thread for > the EECAs - to preserve the sequence. The main idea was to > offload the EECA execution from the thread that triggered > the EECA. The data load was also in a FIFO queue serviced by > a single thread so the files were being loaded in order. > > > > To summarize: > > > > 1. Table creation is handled by a thread pool with an > adjustable size. A thread task is to create a table and its > primary keys. Thread tasks run in parallel. Main thread > blocks until all tables and primary keys are created. > > 2. Main thread creates foreign keys. > > 3. Main thread parses XML files, puts results in data > load queue. > > 4. A data load thread services the data load queue and > stores the data. If an ECA is triggered it puts the ECA info > in an ECA queue. > > 5. An ECA thread services the ECA queue and runs the > ECA. > > 6. Main thread blocks until all queues are empty. > > Except if an eca fires, but the main data load thread keeps > going, > then the main data load thread might insert/update > something that > hasn't yet been manipulated by the eca(s).
Good point. Maybe that's the problem I'm having and needed to track down. > Additionally, and eca can run a service, which can do > anything, > including adding/updating/removing other values, which > cause other > ecas to fire. Which then interact with the > queued-based eca. > > Were your changes only active at startup, during the > initial install, > or were they always available? When data is later > manipulated, during > a test run, certain guarantees still have to be met(which > I'm sure you > know). It was just for run-install. > >> This is a difficult problem to solve; probably not > worth > >> it. During > >> production, different high-level threads, > modifying > >> different > >> entities, will run faster, they are already > running in > >> multiple threads. > >> > >> Most ecas(entity, and probably service) generally > run > >> relatively fast. > >> Trying to break that up and dispatch > into > >> a thread pool might make > >> things slower, as you have cpu cache coherency > effects to > >> content with. > >> > >> What would be better, is to break up the higher > levels into > >> more > >> threads, during an install. That could be > made > >> semi-smart, if we add > >> file dependencies to the data xml files. > Such > >> explicit dependencies > >> will have to be done by hand. Then, a > parallel > >> execution framework, > >> that ran each xml file in parallel, once all of > it's > >> dependencies were > >> met, would give us a speedup. > > > > The minor changes I made cut the data load time in > half. That's not fast enough? ;-) > > > > It didn't take a lot of threads or a lot of thought to > speed things up. The bottom line is, you want to keep parts > of the process going while waiting for DB I/O. > > As for run-install, it starts up catalina. It'd be > nice if that were > multi-threaded as well. But catalina appears to be > serial internally. Getting back to SEDA... We could implement a SEDA-like architecture in a separate control servlet and try it out on different applications by changing their web.xml files. If we had access to the author's test code we could see if it made a difference in overload situations. Where I work we have a classroom filled with computers that could be used as clients to test a SEDA server.
