Thanks ! While the most elegant solution so far posted :) It's also slower than a dog with his feet frozen in mud. On my machine it took about 3 minutes to return 200 URL's.
Its ok though I found a different way that's faster and I'll save the details because the problem is actually more complex than the original question (and so the solution is different as well). But the core is I ended up using collection-match() ... turns out the collections in question have a common naming convention so I didn't use a join after all ... But thanks all for the interesting ideas ! ---------------------------------------- David A. Lee Senior Principal Software Engineer Epocrates, Inc. d...@epocrates.com<mailto:d...@epocrates.com> 812-482-5224 From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Evan Lenz Sent: Thursday, November 17, 2011 6:44 PM To: General MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] "Joins" in search:search or cts:search I think I've got it. You want to join between two lexicons. You can limit your collection URIs to being those that are the same as doc URIs in another collection (like "/summaries" in your case). Enable the URI lexicon and join between it and the collection lexicon: cts:collections()[. = cts:uris("",(),cts:collection-query("/summaries"))] Evan From: Evan Lenz <evan.l...@marklogic.com<mailto:evan.l...@marklogic.com>> Reply-To: General MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Date: Thu, 17 Nov 2011 15:22:29 -0800 To: General MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Subject: Re: [MarkLogic Dev General] "Joins" in search:search or cts:search Just brainstorming a bit, but if you enable the collection lexicon, then you could query for the list of existing collections using cts:collections(), or better yet, using cts:collection-match(): for $uri in cts:collection-match("/logs/*") return xdmp:estimate(collection($uri)) I believe that "0" should not appear in the resulting list, because otherwise, the collection wouldn't exist. (A collection only exists by virtue of a document being associated with it.) cts:collection-match("/logs/*") will return all the collection URIs matching that pattern, and since, if I'm right that there's no such thing as an empty collection, you won't ever need to check if it's empty. So it seems like you could confidently spawn collection deletes on all the existing "/logs/*" collections that way. Evan From: "Lee, David" <d...@epocrates.com<mailto:d...@epocrates.com>> Reply-To: General MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Date: Thu, 17 Nov 2011 11:41:15 -0800 To: "General Mark Logic Developer Discussion (general@developer.marklogic.com<mailto:general@developer.marklogic.com>)" <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Subject: [MarkLogic Dev General] "Joins" in search:search or cts:search I suspect the answer is "no" ... but just plugging the brains out there .. For good or bad I use this architype. I have many "summary" documents say "/logs/1.xml" , "/logs/2.xml" which belongs to the collection "/summaries" There can be many (100k+) Each summary document lists a refernce to external URL's (in this case Amazon S3) from which data could be loaded. If I load the data I put each group into a collection named by the URL of the summary. So say I have 10,000 XML documents referenced by doc("/logs/1.xml") If I choose to load them, they will end up in collection "/logs/1.xml". These summaries are in the collection say "/summaries" The reason for this is for the ability to easily bulk delete blocks of documents based on their summaries. I can list the summaries and by a simple exists( collection( $url) ) cant tell if any actual log documents have been loaded. NOW: I want to be able to delete all records by summary but only if the documents have been loaded. Suppose I had 100k summary URL's I could do for $url in collection("/summaries") if( exists( collection( $url) ) then xdmp:collection-delete($url) else () This works and all ... but suppose I want something more efficiient. Overall there may be only say 1% of the summary documents actually loaded. Furthermore if there were LOTS of ones loaded the above would timeout. So I spawn a thread to delete say [1 to 10] of every summary collection ... but say I have 100k collections most of the threads do nothing. So I have to revert to the above to first check if the collection has anything before spawning a thread. Quesiton: Is there a cts:search option which can do a collection query based on the results of the search itself ? that is (pseudo code) in one cts:search for $c in collection("x")/document-uri(.) if( exists( collection( $c) ) return $c doing this in FLOWR is very slow ... but its what I'm resorting to .... ---------------------------------------- David A. Lee Senior Principal Software Engineer Epocrates, Inc. d...@epocrates.com<mailto:d...@epocrates.com> 812-482-5224
_______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general