Dear Marklogic Users, I have a real problem and got stuck at a dead-end (or so I think at the moment!). I have a number of XML files uploaded to MarkLogic server and need to run a report listing all possible absolute XPATHs. What this means is that I need to parse each XML document, find each node's unique absolute XPATH (for example "/full/xpath/namespace:to/test:specific/node" as I don't need to know about their position using predicates) and then insert a report to another XML file.
It is fine with a small set of content (less than 2000 files) as I hit expanded tree cache. However, in my case there are tens of thousands of files that make everything very slow. What I am looking for is if there is any indexing capabilities that could help me out in this case (getting these XPATHs)? Currently, my Xquery is very simple and does not reuse any of indexing (as I am not sure what I could change to get indexing gains here): ----------------------- xquery version "1.0-ml"; declare namespace html = "http://www.w3.org/1999/xhtml"; <report> { for $collection in subsequence(collection('0NF9'), 1, 20000) for $document in $collection for $node in $document//* let $full-xpath := $node/string-join(ancestor-or-self::*/name(), '/') (: this is building textual full XPath representation :) let $row := fn:doc('/example.xml')//xpath return if (not($row=$full-xpath)) then (xdmp:node-insert-child(doc("/example.xml")/report, (<xpath date="{format-date(fn:current-date(), "[Y0001]-[M01]-[D01]")}" approved="no">{$full-xpath}</xpath>))) else () } </report> -------------- Many Thanks, Arunas ________________________________ LexisNexis is a trading name of REED ELSEVIER (UK) LIMITED - Registered office - 1-3 STRAND, LONDON WC2N 5JR Registered in England - Company No. 02746621
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
