There are techniques you can use to break this work up into smaller chunks, using xdmp:spawn <http://docs.marklogic.com/xdmp:spawn?q=xdmp:spawn>, for example. You should also take a look at xdmp:path <http://docs.marklogic.com/xdmp:path?q=xdmp:path>. However, before we dig too deep, I’d be interested in knowing what your end goal is. Are you trying to summarize the document structures in your database? Look for outliers? Do some sort of validation? Something else? The specific use case will likely influence the particular solution. Any details would be much appreciated.
Justin Justin Makeig Director, Product Management MarkLogic Corporation [email protected]<mailto:[email protected]> www.marklogic.com<http://www.marklogic.com/> On Jul 26, 2013, at 2:52 AM, "Vaitkus, Arunas (LNG-LON)" <[email protected]<mailto:[email protected]>> wrote: Dear Marklogic Users, I have a real problem and got stuck at a dead-end (or so I think at the moment!). I have a number of XML files uploaded to MarkLogic server and need to run a report listing all possible absolute XPATHs. What this means is that I need to parse each XML document, find each node’s unique absolute XPATH (for example “/full/xpath/namespace:to/test:specific/node” as I don’t need to know about their position using predicates) and then insert a report to another XML file. It is fine with a small set of content (less than 2000 files) as I hit expanded tree cache. However, in my case there are tens of thousands of files that make everything very slow. What I am looking for is if there is any indexing capabilities that could help me out in this case (getting these XPATHs)? Currently, my Xquery is very simple and does not reuse any of indexing (as I am not sure what I could change to get indexing gains here): ----------------------- xquery version "1.0-ml"; declare namespace html = "http://www.w3.org/1999/xhtml"; <report> { for $collection in subsequence(collection('0NF9'), 1, 20000) for $document in $collection for $node in $document//* let $full-xpath := $node/string-join(ancestor-or-self::*/name(), '/') (: this is building textual full XPath representation :) let $row := fn:doc('/example.xml')//xpath return if (not($row=$full-xpath)) then (xdmp:node-insert-child(doc("/example.xml")/report, (<xpath date="{format-date(fn:current-date(), "[Y0001]-[M01]-[D01]")}" approved="no">{$full-xpath}</xpath>))) else () } </report> -------------- Many Thanks, Arunas ________________________________ LexisNexis is a trading name of REED ELSEVIER (UK) LIMITED - Registered office - 1-3 STRAND, LONDON WC2N 5JR Registered in England - Company No. 02746621 _______________________________________________ General mailing list [email protected]<mailto:[email protected]> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
