Hi Mike,

I have written something *very* similar just last December! It wasn't fully 
generic though, and I haven't had the time to test it more thoroughly, but it 
has exactly the same basic idea: what to do when you want to load large sets of 
document, but want to rely on MarkLogic Server alone.

I had written a wrapper to execute a particular module that took a few 
parameters. The script itself was able to spawn to itself, and could be 
configured to run on a particular number of threads, timeout for a specific 
period in between the batches, could be given any batch size, etc etc..

:-)

Kind regards,
Geert

>


drs. G.P.H. (Geert) Josten
Consultant


Daidalos BV
Hoekeindsehof 1-4
2665 JZ Bleiswijk

T +31 (0)10 850 1200
F +31 (0)10 850 1199

mailto:[email protected]
http://www.daidalos.nl/

KvK 27164984

P Please consider the environment before printing this mail.
De informatie - verzonden in of met dit e-mailbericht - is afkomstig van 
Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit 
bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit 
bericht kunnen geen rechten worden ontleend.

> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Mike Sokolov
> Sent: woensdag 7 april 2010 15:26
> To: General Mark Logic Developer Discussion
> Subject: [MarkLogic Dev General] tail-recursion with xdmp:spawn
>
> Perhaps this won't be news to others on the list, but I was
> so excited to finally stumble on a solution to a problem I
> have been struggling with for years, that I just had to share.
>
> The problem: how to process a large number of documents using
> xquery only?
>
> This can't be done easily because if all the work is done in
> a single transaction, it eventually runs out of time and
> space.  But xquery modules don't provide an obvious mechanism
> for flow control across multiple transactions.
>
> In the past I've done this by writing an "outer loop" in
> Java, and more recently I tried using CPF.  The problem with
> Java is that it's cumbersome to set up and requires some
> configuration to link it to a database.  I had some success
> with CPF, but I found it to be somewhat inflexible since it
> requires a database insert or update to trigger processing.
> It also requires a bit of configuration to get going.
> Often I find I just want to run through a set of existing
> documents and patch them up in some way or another, (usually
> to clean up some earlier
> mistake!)
>
> Finally I hit on the solution: I wrote a simple script that
> fetches a batch of documents to be updated, processes the
> updates, and then, using a new statement after ";" to
> separate multiple transactions, re-spawns the same script if
> there is more work to be done after logging some indication
> of progress.  Presto - an iterative processor.  This
> technique is a little sensitive to running away into an
> infinite loop if you're not careful about the termination
> condition, but it has many advantages over the other methods.
>
> What do you think?
>
>
> Michael Sokolov
> Engineering Director
> www.ifactory.com
> @iFactoryBoston
>
> PubFactory: the revolutionary e-publishing platform from iFactory
>
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to