Brent,

Those are interesting ideas: I'll add them to my list of potential enhancements to Corb. The Corb source code is fairly simple, and I welcome patches.

Meanwhile, you can implement something like that per-Forest idea without changing any Java code: just provide your own uris-module, as mentioned at http://developer.marklogic.com/svn/corb/trunk/README.html

(: simple URIS-MODULE example :)
let $uris := cts:uris('', 'document')
return (count($uris), $uris)

I gather that you already know about this mechanism, but let's explore it further. To limit to a forest, you could write:

(: process the first forest :)
let $forest-id := xdmp:database-forests(xdmp:database())[1]
let $uris := cts:uris('', 'document', (), 1.0, $forest-id)
return (count($uris), $uris)

This means running corb with a different uris-module for each forest. Alternatively, you could process the uris for each forest in series:

(: process forests in series :)
let $uris :=
  for $forest-id in xdmp:database-forests(xdmp:database())
  return cts:uris('', 'document', (), 1.0, $forest-id)
return (count($uris), $uris)

If you can formulate a query to select documents that haven't yet been processed, you can include that as the third argument to cts:uris(): that should allow your code to resume processing quickly, if you interrupt corb for peak-hour processing.

-- Mike

Hartwig, Brent (CL Tech Sv) wrote:
Hello,

Has anyone extended Corb to sleep during configurable periods or process one 
forest at a time?

We need to modify every object in our ML instance. Multiple merges are 
saturating the IO channel. To keep production stable and usable, we intend to 
put the job to sleep during peak hours and only process one forest at a time. 
Each processed URI will go into a collection, allowing us to verify all are 
processed. Preliminary approaches are described below. Your thoughts and 
experience are welcome. Thank you in advance.

Sleep: Nothing too concerning here (but tried & true is always better). We're 
planning to work around backups, peak hours and allow time for system resources to 
recover before peak hours resume.

Forest: Corb can obtain a list of forests from the specified database via 
Session.getContentbaseMetaData().getForestIds() and iterate in serial. The 
queue would be populated once per forest by substituting the forest ID within 
the provided URIS-MODULE. The initial implementation may impose some usage 
constraints.

-Brent



------------------------------------------------------------------------

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to