Hi, Mike,

Thank you for the quick and informative response -- I expect the XQuery sorting 
URIs by forest will be very helpful. I'm not sure when I will post an outcome 
but did want to pass on my thanks.

-Brent

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Blakeley
Sent: Tuesday, September 23, 2008 12:32 PM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] CORB: Sleep during configurable hours and 
process 1 forest at a time

Brent,

Those are interesting ideas: I'll add them to my list of potential
enhancements to Corb. The Corb source code is fairly simple, and I
welcome patches.

Meanwhile, you can implement something like that per-Forest idea without
changing any Java code: just provide your own uris-module, as mentioned
at http://developer.marklogic.com/svn/corb/trunk/README.html

(: simple URIS-MODULE example :)
let $uris := cts:uris('', 'document')
return (count($uris), $uris)

I gather that you already know about this mechanism, but let's explore
it further. To limit to a forest, you could write:

(: process the first forest :)
let $forest-id := xdmp:database-forests(xdmp:database())[1]
let $uris := cts:uris('', 'document', (), 1.0, $forest-id)
return (count($uris), $uris)

This means running corb with a different uris-module for each forest.
Alternatively, you could process the uris for each forest in series:

(: process forests in series :)
let $uris :=
   for $forest-id in xdmp:database-forests(xdmp:database())
   return cts:uris('', 'document', (), 1.0, $forest-id)
return (count($uris), $uris)

If you can formulate a query to select documents that haven't yet been
processed, you can include that as the third argument to cts:uris():
that should allow your code to resume processing quickly, if you
interrupt corb for peak-hour processing.

-- Mike

Hartwig, Brent (CL Tech Sv) wrote:
> Hello,
>
> Has anyone extended Corb to sleep during configurable periods or process one 
> forest at a time?
>
> We need to modify every object in our ML instance. Multiple merges are 
> saturating the IO channel. To keep production stable and usable, we intend to 
> put the job to sleep during peak hours and only process one forest at a time. 
> Each processed URI will go into a collection, allowing us to verify all are 
> processed. Preliminary approaches are described below. Your thoughts and 
> experience are welcome. Thank you in advance.
>
> Sleep: Nothing too concerning here (but tried & true is always better). We're 
> planning to work around backups, peak hours and allow time for system 
> resources to recover before peak hours resume.
>
> Forest: Corb can obtain a list of forests from the specified database via 
> Session.getContentbaseMetaData().getForestIds() and iterate in serial. The 
> queue would be populated once per forest by substituting the forest ID within 
> the provided URIS-MODULE. The initial implementation may impose some usage 
> constraints.
>
> -Brent
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to