Ivan, If <binary-node/> has the same value as the URI you are checking against, then I think you can do the following:
1) create an element range index of type string with collation equal to codepoint collation on <binary-node/> 2) iterate over each of the values in your list and check whether it exists in any of your <binary-node/> values using cts:element-range-query() (http://developer.marklogic.com/pubs/4.1/apidocs/cts-query.html#cts:element-range-query), and only return values that do not match as an XML report. If you need to consider the database URI and the value of <binary-node/>, then I suggest combining the two into a new element or attribute that you can use for a range index to follow the same approach. Range indexes are memory-mapped and much faster than retrieving full documents from disk. Even at 10ms/doc, 2M queries is going to take a long time to follow your approach of looking at each document. I think the range index approach will be at least an order of magnitude faster. Others may have elaborations on this approach. For example, you could spawn each URI in your list to check the range index, and write to the doc properties if it doesn't match, per Geert's recommendation. Kelly Geert, The task is to go through a list of string values and perform a simple operation for each of them. More precise: I have about 2,000,000 URIs which I received as a plain text document and then turned into XML by means of Perl. Each of them has the following structure: content/repository001/data/store001/location001/file.dat and represents a path to a binary resource which is located in some remote data repository (nothing to do with MarkLogic). In the same time, /data/store001/location001/ is a directory on my MarkLogic server where resource.xml file can be found. In that file there is a node <binary-resource> which must contain binary resource URI, so its value is similar to what was described above: content/repository001/data/store001/location001/file.dat What I need is to go over all of 2,000,000 URIs in my list and check if some of them are not referenced in the appropriate XML instances on MarkLogic, i.e. analyze.xqy does the following: define variable $uri as xs:string external (: $uri = "content/repository001/data/store001/location001/file.dat" :) let $path := fn:concat( "/", fn:string-join( fn:tokenize($uri, "/")[3 to fn:last()-1], "/" ), "/" ) (: $path = "/data/store001/location001/" :) return if (xdmp:directory($path, "1")//binary-resource[1] = $item) then (: Checking reference :) <result path="{$path}">Check OK</result> else <result path="{$path}">WARNING: Resource not bound</result> Apologies for the long message, I just wanted to make things clear. Thanks, _Van _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
