Hi Mike, I have written something *very* similar just last December! It wasn't fully generic though, and I haven't had the time to test it more thoroughly, but it has exactly the same basic idea: what to do when you want to load large sets of document, but want to rely on MarkLogic Server alone.
I had written a wrapper to execute a particular module that took a few parameters. The script itself was able to spawn to itself, and could be configured to run on a particular number of threads, timeout for a specific period in between the batches, could be given any batch size, etc etc.. :-) Kind regards, Geert > drs. G.P.H. (Geert) Josten Consultant Daidalos BV Hoekeindsehof 1-4 2665 JZ Bleiswijk T +31 (0)10 850 1200 F +31 (0)10 850 1199 mailto:[email protected] http://www.daidalos.nl/ KvK 27164984 P Please consider the environment before printing this mail. De informatie - verzonden in of met dit e-mailbericht - is afkomstig van Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit bericht kunnen geen rechten worden ontleend. > From: [email protected] > [mailto:[email protected]] On Behalf Of > Mike Sokolov > Sent: woensdag 7 april 2010 15:26 > To: General Mark Logic Developer Discussion > Subject: [MarkLogic Dev General] tail-recursion with xdmp:spawn > > Perhaps this won't be news to others on the list, but I was > so excited to finally stumble on a solution to a problem I > have been struggling with for years, that I just had to share. > > The problem: how to process a large number of documents using > xquery only? > > This can't be done easily because if all the work is done in > a single transaction, it eventually runs out of time and > space. But xquery modules don't provide an obvious mechanism > for flow control across multiple transactions. > > In the past I've done this by writing an "outer loop" in > Java, and more recently I tried using CPF. The problem with > Java is that it's cumbersome to set up and requires some > configuration to link it to a database. I had some success > with CPF, but I found it to be somewhat inflexible since it > requires a database insert or update to trigger processing. > It also requires a bit of configuration to get going. > Often I find I just want to run through a set of existing > documents and patch them up in some way or another, (usually > to clean up some earlier > mistake!) > > Finally I hit on the solution: I wrote a simple script that > fetches a batch of documents to be updated, processes the > updates, and then, using a new statement after ";" to > separate multiple transactions, re-spawns the same script if > there is more work to be done after logging some indication > of progress. Presto - an iterative processor. This > technique is a little sensitive to running away into an > infinite loop if you're not careful about the termination > condition, but it has many advantages over the other methods. > > What do you think? > > > Michael Sokolov > Engineering Director > www.ifactory.com > @iFactoryBoston > > PubFactory: the revolutionary e-publishing platform from iFactory > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
