Hi, I'm trying to use Kelly's approach.
On Thu, Jun 25, 2009 at 15:41, Kelly Stirman<[email protected]> wrote: > Here's one way I would do it in MarkLogic: > > 1) create a range index on your link element or attribute. > 2) iterate over all the unique links using cts:element-values() > 3) either spawn or invoke a module that a) checks whether the link is valid, > then b) records that it was invalid. > > You might put the invalid entries back as documents in the database, or on > their parent documents as properties or an attribute on the link - there are > several options here. > > This allows you to have lots of simultaneous "threads" working. If you spawn > each link, then the number of threads is configurable on the task server > configuration screen, and you can also check the status page to see how your > process is coming along. Note that spawning has the disadvantage of not > surviving server restarts, which is why you may decide to use CPF to process > each document as it is inserted or updated. > > Kelly > I know I have created a range index correctly (following the instructions in the admin guide, 19.5 Defining Element Range Indexes): admin user interface: mydatabase->element range indexes scalar type: string namespace uri: [none] localname: DOI collation: http://marklogic.com/collation/ range value positions: false For good measure I restarted ML, and another visit of this page shows that the information is correctly entered. Running the following code from CQ returns correctly (and very fast indeed) all values: xquery version "1.0-ml"; cts:element-values(xs:QName("DOI")) In my Xquery code, I have this line which should work (i.e. return a sequence of strings): let $all-dois := cts:element-values(xs:QName("DOI")) which throws this error (copy/paste from OxygenXML): SystemID: unknown Severity: error Description: DBG-EVALERROR: dbg:value(10297136874332754404) -- No return value. Evaluation encountered an error. Start location: 2:0 This doesn't look like an exception for an index that cannot be found ... I'm not sure what to do now. Also, one of the examples below used the same approach ... So, is it something in my module (see end of message)? I've found the following references on the web which were useful for me: - http://xquery.typepad.com/xquery/2007/08/xquery-and-lazy.html (one of the few examples on the web of using cts:element-values) - http://zeus.riskfocusinc.com/portal/display/ART/Loading+Large+FpML+data+into+MarkLogic (an example of loading lots of data into ML using spawn or invoke) The following is the complete code of the module (largely inspired by the riskfocusinc.com code): xquery version "1.0-ml"; (: cycle over all DOI values using cts:element-values() and xdmp:spawn a "thread" for each :) for $doi in cts:element-values(xs:QName("DOI")) return <div> { try { xdmp:spawn( "/app/backend/query-doi.xqy", (xs:QName("doi"), $doi), <b>tested {$doi} ...</b> ) } catch( $e) { <span> Problem while checking {$doi}: <br/> <i xmlns:e="http://marklogic.com/xdmp/error">{$e/e:message/text()}</i> </span> } } </div> Your help is greatly appreciated. cheers, Jakob. _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
