Brent,
Those are interesting ideas: I'll add them to my list of potential
enhancements to Corb. The Corb source code is fairly simple, and I
welcome patches.
Meanwhile, you can implement something like that per-Forest idea without
changing any Java code: just provide your own uris-module, as mentioned
at http://developer.marklogic.com/svn/corb/trunk/README.html
(: simple URIS-MODULE example :)
let $uris := cts:uris('', 'document')
return (count($uris), $uris)
I gather that you already know about this mechanism, but let's explore
it further. To limit to a forest, you could write:
(: process the first forest :)
let $forest-id := xdmp:database-forests(xdmp:database())[1]
let $uris := cts:uris('', 'document', (), 1.0, $forest-id)
return (count($uris), $uris)
This means running corb with a different uris-module for each forest.
Alternatively, you could process the uris for each forest in series:
(: process forests in series :)
let $uris :=
for $forest-id in xdmp:database-forests(xdmp:database())
return cts:uris('', 'document', (), 1.0, $forest-id)
return (count($uris), $uris)
If you can formulate a query to select documents that haven't yet been
processed, you can include that as the third argument to cts:uris():
that should allow your code to resume processing quickly, if you
interrupt corb for peak-hour processing.
-- Mike
Hartwig, Brent (CL Tech Sv) wrote:
Hello,
Has anyone extended Corb to sleep during configurable periods or process one
forest at a time?
We need to modify every object in our ML instance. Multiple merges are
saturating the IO channel. To keep production stable and usable, we intend to
put the job to sleep during peak hours and only process one forest at a time.
Each processed URI will go into a collection, allowing us to verify all are
processed. Preliminary approaches are described below. Your thoughts and
experience are welcome. Thank you in advance.
Sleep: Nothing too concerning here (but tried & true is always better). We're
planning to work around backups, peak hours and allow time for system resources to
recover before peak hours resume.
Forest: Corb can obtain a list of forests from the specified database via
Session.getContentbaseMetaData().getForestIds() and iterate in serial. The
queue would be populated once per forest by substituting the forest ID within
the provided URIS-MODULE. The initial implementation may impose some usage
constraints.
-Brent
------------------------------------------------------------------------
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general