The "uri lexicon" and "collection lexicon" database settings can be thought of as enabling these range indexes:
* type=anyUri, namespace=http://marklogic.com/xdmp, localname=document * type=anyUri, namespace=http://marklogic.com/xdmp, localname=collection Example calls: (: This is like cts:uris() :) cts:element-values(xs:QName("xdmp:document"))[1 to 10] (: There's no other way to express this, cool eh :) cts:element-value-co-occurrences( xs:QName("xdmp:document"), xs:QName("xdmp:collection") )[1 to 10] If you got a complaint, Evan, it's probably because you didn't have the two lexicons enabled. -jh- On Nov 19, 2011, at 10:43 AM, Evan Lenz wrote: > That sounds promising, but I don't think it would help much here, since the > aim is to find, given a set of document URIs, all the string-equal collection > URIs that exist (but in practice apply to different documents, i.e. do not > co-occur). > > However, I'm still wondering: how do you express a co-occurrence call on > document and collection URIs? I tried using the QNames "xdmp:document" and > "xdmp:collection" with cts:element-value-co-occurrences() but the server > complained. Is there a more up-to-date way of doing this? > > Evan > > From: Kelly Stirman <[email protected]> > Reply-To: General MarkLogic Developer Discussion > <[email protected]> > Date: Sat, 19 Nov 2011 08:52:52 -0800 > To: "[email protected]" <[email protected]> > Subject: Re: [MarkLogic Dev General] "Joins" in search: search or cts:search > (Damon Feldman) > > You could also do co-occurrence of the document and collection uris. > From: [email protected] > Sent: 11/19/2011 8:16 AM > To: [email protected] > Subject: General Digest, Vol 89, Issue 80 > > Send General mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > http://developer.marklogic.com/mailman/listinfo/general > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of General digest..." > > > Today's Topics: > > 1. Re: "Joins" in search: search or cts:search (Damon Feldman) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 19 Nov 2011 08:15:50 -0800 > From: Damon Feldman <[email protected]> > Subject: Re: [MarkLogic Dev General] "Joins" in search: search or > cts:search > To: General MarkLogic Developer Discussion > <[email protected]> > Message-ID: > <d20c296d14127d4ebd176ad949d8a75a0600ce4...@exchg-be.marklogic.com> > Content-Type: text/plain; charset="us-ascii" > > Great solution. > > My guess on performance is that it will be very good (and functional). To do > cts:collections($uri, "limit=1") many times will (I assume) have to do a > bunch of binary searches through the URI lexicon for each URI in /summaries, > which may be slower than a single hash-based intersection, but then again it > avoids pulling back the entire collection lexicon, and avoids the procedural > flavor of map:put(). > > I'm be interested to know how it performs in reality. That kind of thing is > where performance tuning becomes interesting. > > Damon > > From: [email protected] > [mailto:[email protected]] On Behalf Of Evan Lenz > Sent: Saturday, November 19, 2011 1:51 AM > To: General MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] "Joins" in search: search or cts:search > > Glad you found a working solution! > > But I hope you don't mind if I still try to rescue that frozen dog. :-) > Damon's solution using maps is probably the best, but I've been wondering if > there was a purely functional yet still performant way to do it. Then this > popped into my head while brushing my teeth tonight: > > for $uri in cts:uris("",(),cts:collection-query("/summaries")) > return cts:collections($uri,"limit=1")[. eq $uri] > > Damon, does that fit the bill? > > Evan > > From: "Lee, David" <[email protected]<mailto:[email protected]>> > Reply-To: General MarkLogic Developer Discussion > <[email protected]<mailto:[email protected]>> > Date: Fri, 18 Nov 2011 07:21:53 -0800 > To: General MarkLogic Developer Discussion > <[email protected]<mailto:[email protected]>> > Subject: Re: [MarkLogic Dev General] "Joins" in search: search or cts:search > > Thanks ! > While the most elegant solution so far posted :) > It's also slower than a dog with his feet frozen in mud. > On my machine it took about 3 minutes to return 200 URL's. > > > Its ok though I found a different way that's faster and I'll save the details > because the problem is actually more complex than the original question (and > so the solution is different as well). > But the core is I ended up using collection-match() ... turns out the > collections in question have a common naming convention so I didn't use a > join after all ... > > But thanks all for the interesting ideas ! > > > > ---------------------------------------- > David A. Lee > Senior Principal Software Engineer > Epocrates, Inc. > [email protected]<mailto:[email protected]> > 812-482-5224 > > From: > [email protected]<mailto:[email protected]> > [mailto:[email protected]] On Behalf Of Evan Lenz > Sent: Thursday, November 17, 2011 6:44 PM > To: General MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] "Joins" in search:search or cts:search > > I think I've got it. You want to join between two lexicons. You can limit > your collection URIs to being those that are the same as doc URIs in another > collection (like "/summaries" in your case). Enable the URI lexicon and join > between it and the collection lexicon: > > cts:collections()[. = cts:uris("",(),cts:collection-query("/summaries"))] > > Evan > > From: Evan Lenz <[email protected]<mailto:[email protected]>> > Reply-To: General MarkLogic Developer Discussion > <[email protected]<mailto:[email protected]>> > Date: Thu, 17 Nov 2011 15:22:29 -0800 > To: General MarkLogic Developer Discussion > <[email protected]<mailto:[email protected]>> > Subject: Re: [MarkLogic Dev General] "Joins" in search:search or cts:search > > Just brainstorming a bit, but if you enable the collection lexicon, then you > could query for the list of existing collections using cts:collections(), or > better yet, using cts:collection-match(): > > for $uri in cts:collection-match("/logs/*") return > xdmp:estimate(collection($uri)) > > I believe that "0" should not appear in the resulting list, because > otherwise, the collection wouldn't exist. (A collection only exists by virtue > of a document being associated with it.) cts:collection-match("/logs/*") will > return all the collection URIs matching that pattern, and since, if I'm right > that there's no such thing as an empty collection, you won't ever need to > check if it's empty. So it seems like you could confidently spawn collection > deletes on all the existing "/logs/*" collections that way. > > Evan > > From: "Lee, David" <[email protected]<mailto:[email protected]>> > Reply-To: General MarkLogic Developer Discussion > <[email protected]<mailto:[email protected]>> > Date: Thu, 17 Nov 2011 11:41:15 -0800 > To: "General Mark Logic Developer Discussion > ([email protected]<mailto:[email protected]>)" > <[email protected]<mailto:[email protected]>> > Subject: [MarkLogic Dev General] "Joins" in search:search or cts:search > > I suspect the answer is "no" ... but just plugging the brains out there .. > > For good or bad I use this architype. > > I have many "summary" documents say "/logs/1.xml" , "/logs/2.xml" which > belongs to the collection "/summaries" > > There can be many (100k+) > > Each summary document lists a refernce to external URL's (in this case Amazon > S3) from which data could be loaded. > If I load the data I put each group into a collection named by the URL of the > summary. > So say I have 10,000 XML documents referenced by doc("/logs/1.xml") If I > choose to load them, they will end up in collection > "/logs/1.xml". These summaries are in the collection say "/summaries" > > The reason for this is for the ability to easily bulk delete blocks of > documents based on their summaries. > I can list the summaries and by a simple > exists( collection( $url) ) > > cant tell if any actual log documents have been loaded. > > > NOW: I want to be able to delete all records by summary but only if the > documents have been loaded. > Suppose I had 100k summary URL's I could do > > for $url in collection("/summaries") > if( exists( collection( $url) ) then > xdmp:collection-delete($url) > else () > > > This works and all ... but suppose I want something more efficiient. > Overall there may be only say 1% of the summary documents actually loaded. > Furthermore if there were LOTS of ones loaded the above would timeout. > > So I spawn a thread to delete say [1 to 10] of every summary collection ... > but say I have 100k collections most of the threads do nothing. > So I have to revert to the above to first check if the collection has > anything before spawning a thread. > > Quesiton: Is there a cts:search option which can do a collection query > based on the results of the search itself ? > that is (pseudo code) > in one cts:search > > for $c in collection("x")/document-uri(.) > if( exists( collection( $c) ) > return $c > > doing this in FLOWR is very slow ... > but its what I'm resorting to .... > > > > > > > > > > > > ---------------------------------------- > David A. Lee > Senior Principal Software Engineer > Epocrates, Inc. > [email protected]<mailto:[email protected]> > 812-482-5224 > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://developer.marklogic.com/pipermail/general/attachments/20111119/c5d016ed/attachment.html > > > ------------------------------ > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > > > End of General Digest, Vol 89, Issue 80 > *************************************** > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
