Dear Marklogic Users,

I have a real problem and got stuck at a dead-end (or so I think at the 
moment!). I have a number of XML files uploaded to MarkLogic server and need to 
run a report listing all possible absolute XPATHs. What this means is that I 
need to parse each XML document, find each node's unique absolute XPATH (for 
example "/full/xpath/namespace:to/test:specific/node" as I don't need to know 
about their position using predicates) and then insert a report to another XML 
file.

It is fine with a small set of content (less than 2000 files) as I hit expanded 
tree cache. However, in my case there are tens of thousands of files that make 
everything very slow.

What I am looking for is if there is any indexing capabilities that could help 
me out in this case (getting these XPATHs)?

Currently, my Xquery is very simple and does not reuse any of indexing (as I am 
not sure what I could change to get indexing gains here):

-----------------------
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";;

<report>
{
 for $collection in subsequence(collection('0NF9'), 1, 20000)
     for $document in $collection
        for $node in $document//*
           let $full-xpath := $node/string-join(ancestor-or-self::*/name(), 
'/')  (: this is building textual full XPath representation :)
           let $row := fn:doc('/example.xml')//xpath
              return if (not($row=$full-xpath)) then 
(xdmp:node-insert-child(doc("/example.xml")/report, (<xpath 
date="{format-date(fn:current-date(), "[Y0001]-[M01]-[D01]")}" 
approved="no">{$full-xpath}</xpath>))) else ()

}
</report>
--------------

Many Thanks,
Arunas

________________________________

LexisNexis is a trading name of REED ELSEVIER (UK) LIMITED - Registered office 
- 1-3 STRAND, LONDON WC2N 5JR
Registered in England - Company No. 02746621
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to