That would work well and has the advantage that it can operate on an arbitrarily large set of documents. The downside is that you'll only have one processing thread working at a time.
Another approach is to have an initial query create a list of documents to process and cut it into chunks (say 1000 documents each) that are each handed off to a spawned task. With this, the configured number of threads in the task queue will run in parallel giving you higher overall throughput. Wayne. On Wed, 2010-04-07 at 06:26 -0700, Mike Sokolov wrote: > Perhaps this won't be news to others on the list, but I was so excited > to finally stumble on a solution to a problem I have been struggling > with for years, that I just had to share. > > The problem: how to process a large number of documents using xquery only? > > This can't be done easily because if all the work is done in a single > transaction, it eventually runs out of time and space. But xquery > modules don't provide an obvious mechanism for flow control across > multiple transactions. > > In the past I've done this by writing an "outer loop" in Java, and more > recently I tried using CPF. The problem with Java is that it's > cumbersome to set up and requires some configuration to link it to a > database. I had some success with CPF, but I found it to be somewhat > inflexible since it requires a database insert or update to trigger > processing. It also requires a bit of configuration to get going. > Often I find I just want to run through a set of existing documents and > patch them up in some way or another, (usually to clean up some earlier > mistake!) > > Finally I hit on the solution: I wrote a simple script that fetches a > batch of documents to be updated, processes the updates, and then, using > a new statement after ";" to separate multiple transactions, re-spawns > the same script if there is more work to be done after logging some > indication of progress. Presto - an iterative processor. This > technique is a little sensitive to running away into an infinite loop if > you're not careful about the termination condition, but it has many > advantages over the other methods. > > What do you think? > > > Michael Sokolov > Engineering Director > www.ifactory.com > @iFactoryBoston > > PubFactory: the revolutionary e-publishing platform from iFactory > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
