Hi Kelly,

 

Would you elaborate on the new functionality of 4.2 to which you are
referring?  I'm using a CPF to effectively spawn tasks on individual
documents extracted from a zipfile, but the problem is that I can't
conveniently capture all the statuses of the spawned tasks and provide a
single email that collects and reports on all the statuses.  In addition
there seems to be no way to report to the parent when all the spawned tasks
have completed.  I could periodically poll for all the spawned tasks to
complete, but that gets kind of messy.

 

Thank you!

 

Tim Meagher

 

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Kelly Stirman
Sent: Wednesday, April 07, 2010 9:51 AM
To: [email protected]
Subject: [MarkLogic Dev General] RE: tail-recursion with xdmp:spawn

 

Hi Mike,

 

Yes, that's a good approach overall for one-off processing. It doesn't
provide the robustness of CPF, but it can be easier to set up. It allows for
a multi-threaded approach to processing your documents by configuring the
number of threads on the task server. 

 

Your termination condition could use properties. When you have completed an
update on a document, add a property flag. Then your processing can look for
documents that do not have the property in place.

 

/foo[not(property::bar)] is fast

 

/foo[not(property::bar = "baz")] is also fast.

 

cts:property-query()

 

Hope this helps.

 

Also, I think there is new functionality coming in 4.2 that you will
appreciate. :-) Hope to see you at the user conference.

 

Kelly

 

Message: 4

Date: Wed, 07 Apr 2010 09:26:06 -0400

From: Mike Sokolov <[email protected]>

Subject: [MarkLogic Dev General] tail-recursion with xdmp:spawn

To: General Mark Logic Developer Discussion

      <[email protected]>

Message-ID: <[email protected]>

Content-Type: text/plain; charset=ISO-8859-1; format=flowed

 

Perhaps this won't be news to others on the list, but I was so excited 

to finally stumble on a solution to a problem I have been struggling 

with for years, that I just had to share.

 

The problem: how to process a large number of documents using xquery only?

 

This can't be done easily because if all the work is done in a single 

transaction, it eventually runs out of time and space.  But xquery 

modules don't provide an obvious mechanism for flow control across 

multiple transactions.

 

In the past I've done this by writing an "outer loop" in Java, and more 

recently I tried using CPF.  The problem with Java is that it's 

cumbersome to set up and requires some configuration to link it to a 

database.  I had some success  with CPF, but I found it to be somewhat 

inflexible since it requires a database insert or update to trigger 

processing.  It also requires a bit of configuration to get going.  

Often I find I just want to run through a set of existing documents and 

patch them up in some way or another, (usually to clean up some earlier 

mistake!)

 

Finally I hit on the solution: I wrote a simple script that fetches a 

batch of documents to be updated, processes the updates, and then, using 

a new statement after ";" to separate multiple transactions, re-spawns 

the same script if there is more work to be done after logging some 

indication of progress.  Presto - an iterative processor.  This 

technique is a little sensitive to running away into an infinite loop if 

you're not careful about the termination condition, but it has many 

advantages over the other methods.

 

What do you think?

 

 

Michael Sokolov

Engineering Director

www.ifactory.com

@iFactoryBoston

 

PubFactory: the revolutionary e-publishing platform from iFactory

 

 

 

------------------------------

_______________________________________________

General mailing list

[email protected]

http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to