Hi, Mike, Thank you for the quick and informative response -- I expect the XQuery sorting URIs by forest will be very helpful. I'm not sure when I will post an outcome but did want to pass on my thanks.
-Brent -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Blakeley Sent: Tuesday, September 23, 2008 12:32 PM To: General Mark Logic Developer Discussion Subject: Re: [MarkLogic Dev General] CORB: Sleep during configurable hours and process 1 forest at a time Brent, Those are interesting ideas: I'll add them to my list of potential enhancements to Corb. The Corb source code is fairly simple, and I welcome patches. Meanwhile, you can implement something like that per-Forest idea without changing any Java code: just provide your own uris-module, as mentioned at http://developer.marklogic.com/svn/corb/trunk/README.html (: simple URIS-MODULE example :) let $uris := cts:uris('', 'document') return (count($uris), $uris) I gather that you already know about this mechanism, but let's explore it further. To limit to a forest, you could write: (: process the first forest :) let $forest-id := xdmp:database-forests(xdmp:database())[1] let $uris := cts:uris('', 'document', (), 1.0, $forest-id) return (count($uris), $uris) This means running corb with a different uris-module for each forest. Alternatively, you could process the uris for each forest in series: (: process forests in series :) let $uris := for $forest-id in xdmp:database-forests(xdmp:database()) return cts:uris('', 'document', (), 1.0, $forest-id) return (count($uris), $uris) If you can formulate a query to select documents that haven't yet been processed, you can include that as the third argument to cts:uris(): that should allow your code to resume processing quickly, if you interrupt corb for peak-hour processing. -- Mike Hartwig, Brent (CL Tech Sv) wrote: > Hello, > > Has anyone extended Corb to sleep during configurable periods or process one > forest at a time? > > We need to modify every object in our ML instance. Multiple merges are > saturating the IO channel. To keep production stable and usable, we intend to > put the job to sleep during peak hours and only process one forest at a time. > Each processed URI will go into a collection, allowing us to verify all are > processed. Preliminary approaches are described below. Your thoughts and > experience are welcome. Thank you in advance. > > Sleep: Nothing too concerning here (but tried & true is always better). We're > planning to work around backups, peak hours and allow time for system > resources to recover before peak hours resume. > > Forest: Corb can obtain a list of forests from the specified database via > Session.getContentbaseMetaData().getForestIds() and iterate in serial. The > queue would be populated once per forest by substituting the forest ID within > the provided URIS-MODULE. The initial implementation may impose some usage > constraints. > > -Brent > > > > ------------------------------------------------------------------------ > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
