You can also use xdmp:spawn to update a batch at a time.  You would then need 
two modules, the xdmp:spawn module, which typically would have an external 
variable that you would use to pass in the URLs to process, and another module 
that figures out the batches and then passes them off to the spawn module.

-Danny

From: [email protected] 
[mailto:[email protected]] On Behalf Of Brent Hartwig
Sent: Monday, May 06, 2013 1:46 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Need help with mass updates

Hi, Gary,

When this is all in one transaction, it doesn't matter how you break it up.  
CORB<http://marklogic.github.io/corb/index.html> is built for this purpose.  
You provide two queries.  One selects the documents to process.  The other 
processes the documents, one at a time.  Each document is processed in a 
transaction of its own.

For the first query, it's good to come up with a way to only select unprocessed 
documents, unless you wish to reprocess all.  This allows for the process to be 
interrupted but pick up where it left off, later.

CORB is a Java program.  You get to configure the number of threads.

I couldn't say if there's now a standard feature that supersedes CORB.

-Brent

From: 
[email protected]<mailto:[email protected]>
 [mailto:[email protected]] On Behalf Of Gary Larsen
Sent: Monday, May 06, 2013 4:36 PM
To: General MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Need help with mass updates

Hi,

I have a query to update documents, but when there are many I get the dreaded 
XDMP-EXPNTREECACHEFULL error.   I've had luck avoiding this error when 
returning large result sets by processing the docs in segments [$start to 
$end], but it does not seem to help with the updates.

Is there a trick to performing mass updates?  Any advice would be appreciated.

xquery version "1.0-ml";
declare default element namespace 
'http://developer.envisn.com/xmlns/envisn/netvisn/';

let $cq := cts:collection-query('audit_history')

let $incr := 100
let $size := xdmp:estimate(cts:search(doc(), $cq, 'unfiltered'))
let $segs := ceiling($size div $incr) return

for $x in (1 to $segs)
     let $start :=  (($x -1) * $incr) +1
     let $end := $start + $incr -1

     for $d in cts:search(doc(), $cq, 'unfiltered')[$start to $end]
         let $lk := $d/auditHistory/lookupInfo
         let  $loc := element auditParentDisplayPath 
{$lk/parentDisplayPath/text() },
                $name := element auditDefaultName {$lk/defaultName/text() },
                $class := element auditObjectClass {$lk/objectClass/text() }  
return

         (xdmp:node-replace($lk/parentDisplayPath, $loc),
          xdmp:node-replace($lk/defaultName, $name),
          xdmp:node-replace($lk/objectClass, $class),

          for $u in $d/auditHistory//Action/user
            let $uname :=  element auditUserName {$u/username/text() }  return
            xdmp:node-replace($u/username, $uname)
          )
Thanks,

Gary Larsen
Envisn Inc.
508-259-6465

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to