There are techniques you can use to break this work up into smaller chunks, 
using xdmp:spawn <http://docs.marklogic.com/xdmp:spawn?q=xdmp:spawn>, for 
example. You should also take a look at xdmp:path 
<http://docs.marklogic.com/xdmp:path?q=xdmp:path>. However, before we dig too 
deep, I’d be interested in knowing what your end goal is. Are you trying to 
summarize the document structures in your database? Look for outliers? Do some 
sort of validation? Something else? The specific use case will likely influence 
the particular solution. Any details would be much appreciated.

Justin


Justin Makeig
Director, Product Management
MarkLogic Corporation
[email protected]<mailto:[email protected]>
www.marklogic.com<http://www.marklogic.com/>



On Jul 26, 2013, at 2:52 AM, "Vaitkus, Arunas (LNG-LON)" 
<[email protected]<mailto:[email protected]>>
 wrote:

Dear Marklogic Users,

I have a real problem and got stuck at a dead-end (or so I think at the 
moment!). I have a number of XML files uploaded to MarkLogic server and need to 
run a report listing all possible absolute XPATHs. What this means is that I 
need to parse each XML document, find each node’s unique absolute XPATH (for 
example “/full/xpath/namespace:to/test:specific/node” as I don’t need to know 
about their position using predicates) and then insert a report to another XML 
file.

It is fine with a small set of content (less than 2000 files) as I hit expanded 
tree cache. However, in my case there are tens of thousands of files that make 
everything very slow.

What I am looking for is if there is any indexing capabilities that could help 
me out in this case (getting these XPATHs)?

Currently, my Xquery is very simple and does not reuse any of indexing (as I am 
not sure what I could change to get indexing gains here):

-----------------------
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";;

<report>
{
 for $collection in subsequence(collection('0NF9'), 1, 20000)
     for $document in $collection
        for $node in $document//*
           let $full-xpath := $node/string-join(ancestor-or-self::*/name(), 
'/')  (: this is building textual full XPath representation :)
           let $row := fn:doc('/example.xml')//xpath
              return if (not($row=$full-xpath)) then 
(xdmp:node-insert-child(doc("/example.xml")/report, (<xpath 
date="{format-date(fn:current-date(), "[Y0001]-[M01]-[D01]")}" 
approved="no">{$full-xpath}</xpath>))) else ()

}
</report>
--------------

Many Thanks,
Arunas

________________________________

LexisNexis is a trading name of REED ELSEVIER (UK) LIMITED - Registered office 
- 1-3 STRAND, LONDON WC2N 5JR
Registered in England - Company No. 02746621

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to