The "uri lexicon" and "collection lexicon" database settings can be thought of 
as enabling these range indexes:

 * type=anyUri, namespace=http://marklogic.com/xdmp, localname=document

 * type=anyUri, namespace=http://marklogic.com/xdmp, localname=collection

Example calls:

(: This is like cts:uris() :)
cts:element-values(xs:QName("xdmp:document"))[1 to 10]

(: There's no other way to express this, cool eh :)
cts:element-value-co-occurrences(
  xs:QName("xdmp:document"),
  xs:QName("xdmp:collection")
)[1 to 10]

If you got a complaint, Evan, it's probably because you didn't have the two 
lexicons enabled.

-jh-

On Nov 19, 2011, at 10:43 AM, Evan Lenz wrote:

> That sounds promising, but I don't think it would help much here, since the 
> aim is to find, given a set of document URIs, all the string-equal collection 
> URIs that exist (but in practice apply to different documents, i.e. do not 
> co-occur).
> 
> However, I'm still wondering: how do you express a co-occurrence call on 
> document and collection URIs? I tried using the QNames "xdmp:document" and 
> "xdmp:collection" with cts:element-value-co-occurrences() but the server 
> complained. Is there a more up-to-date way of doing this?
> 
> Evan
> 
> From: Kelly Stirman <[email protected]>
> Reply-To: General MarkLogic Developer Discussion 
> <[email protected]>
> Date: Sat, 19 Nov 2011 08:52:52 -0800
> To: "[email protected]" <[email protected]>
> Subject: Re: [MarkLogic Dev General] "Joins" in search: search or cts:search 
> (Damon Feldman)
> 
> You could also do co-occurrence of the document and collection uris. 
> From: [email protected]
> Sent: 11/19/2011 8:16 AM
> To: [email protected]
> Subject: General Digest, Vol 89, Issue 80
> 
> Send General mailing list submissions to
>         [email protected]
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://developer.marklogic.com/mailman/listinfo/general
> or, via email, send a message with subject or body 'help' to
>         [email protected]
> 
> You can reach the person managing the list at
>         [email protected]
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of General digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: "Joins" in search: search or cts:search (Damon Feldman)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Sat, 19 Nov 2011 08:15:50 -0800
> From: Damon Feldman <[email protected]>
> Subject: Re: [MarkLogic Dev General] "Joins" in search: search or
>         cts:search
> To: General MarkLogic Developer Discussion
>         <[email protected]>
> Message-ID:
>         <d20c296d14127d4ebd176ad949d8a75a0600ce4...@exchg-be.marklogic.com>
> Content-Type: text/plain; charset="us-ascii"
> 
> Great solution.
> 
> My guess on performance is that it will be very good (and functional). To do 
> cts:collections($uri, "limit=1") many times will (I assume) have to do a 
> bunch of binary searches through the URI lexicon for each URI in /summaries, 
> which may be slower than a single hash-based intersection, but then again it 
> avoids pulling back the entire collection lexicon, and avoids the procedural 
> flavor of map:put().
> 
> I'm be interested to know how it performs in reality. That kind of thing is 
> where performance tuning becomes interesting.
> 
> Damon
> 
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Evan Lenz
> Sent: Saturday, November 19, 2011 1:51 AM
> To: General MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] "Joins" in search: search or cts:search
> 
> Glad you found a working solution!
> 
> But I hope you don't mind if I still try to rescue that frozen dog. :-)  
> Damon's solution using maps is probably the best, but I've been wondering if 
> there was a purely functional yet still performant way to do it. Then this 
> popped into my head while brushing my teeth tonight:
> 
> for $uri in cts:uris("",(),cts:collection-query("/summaries"))
> return cts:collections($uri,"limit=1")[. eq $uri]
> 
> Damon, does that fit the bill?
> 
> Evan
> 
> From: "Lee, David" <[email protected]<mailto:[email protected]>>
> Reply-To: General MarkLogic Developer Discussion 
> <[email protected]<mailto:[email protected]>>
> Date: Fri, 18 Nov 2011 07:21:53 -0800
> To: General MarkLogic Developer Discussion 
> <[email protected]<mailto:[email protected]>>
> Subject: Re: [MarkLogic Dev General] "Joins" in search: search or cts:search
> 
> Thanks !
> While the most elegant solution so far posted :)
> It's also slower than a dog with his feet frozen in mud.
> On my machine it took about 3 minutes to return 200 URL's.
> 
> 
> Its ok though I found a different way that's faster and I'll save the details 
> because the problem is actually more complex than the original question (and 
> so the solution is different as well).
> But the core is I ended up using collection-match()  ... turns out the 
> collections in question have a common naming convention so I didn't use a 
> join after all ...
> 
> But thanks all for the interesting ideas !
> 
> 
> 
> ----------------------------------------
> David A. Lee
> Senior Principal Software Engineer
> Epocrates, Inc.
> [email protected]<mailto:[email protected]>
> 812-482-5224
> 
> From: 
> [email protected]<mailto:[email protected]>
>  [mailto:[email protected]] On Behalf Of Evan Lenz
> Sent: Thursday, November 17, 2011 6:44 PM
> To: General MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] "Joins" in search:search or cts:search
> 
> I think I've got it. You want to join between two lexicons. You can limit 
> your collection URIs to being those that are the same as doc URIs in another 
> collection (like "/summaries" in your case). Enable the URI lexicon and join 
> between it and the collection lexicon:
> 
> cts:collections()[. = cts:uris("",(),cts:collection-query("/summaries"))]
> 
> Evan
> 
> From: Evan Lenz <[email protected]<mailto:[email protected]>>
> Reply-To: General MarkLogic Developer Discussion 
> <[email protected]<mailto:[email protected]>>
> Date: Thu, 17 Nov 2011 15:22:29 -0800
> To: General MarkLogic Developer Discussion 
> <[email protected]<mailto:[email protected]>>
> Subject: Re: [MarkLogic Dev General] "Joins" in search:search or cts:search
> 
> Just brainstorming a bit, but if you enable the collection lexicon, then you 
> could query for the list of existing collections using cts:collections(), or 
> better yet, using cts:collection-match():
> 
> for $uri in cts:collection-match("/logs/*") return 
> xdmp:estimate(collection($uri))
> 
> I believe that "0" should not appear in the resulting list, because 
> otherwise, the collection wouldn't exist. (A collection only exists by virtue 
> of a document being associated with it.) cts:collection-match("/logs/*") will 
> return all the collection URIs matching that pattern, and since, if I'm right 
> that there's no such thing as an empty collection, you won't ever need to 
> check if it's empty. So it seems like you could confidently spawn collection 
> deletes on all the existing "/logs/*" collections that way.
> 
> Evan
> 
> From: "Lee, David" <[email protected]<mailto:[email protected]>>
> Reply-To: General MarkLogic Developer Discussion 
> <[email protected]<mailto:[email protected]>>
> Date: Thu, 17 Nov 2011 11:41:15 -0800
> To: "General Mark Logic Developer Discussion 
> ([email protected]<mailto:[email protected]>)" 
> <[email protected]<mailto:[email protected]>>
> Subject: [MarkLogic Dev General] "Joins" in search:search or cts:search
> 
> I suspect the answer is "no" ... but just plugging the brains out there ..
> 
> For good or bad I use this architype.
> 
> I have many "summary" documents  say  "/logs/1.xml" , "/logs/2.xml"  which 
> belongs to the collection "/summaries"
> 
> There can be many (100k+)
> 
> Each summary document lists a refernce to external URL's (in this case Amazon 
> S3) from which data could be loaded.
> If I load the data I put each group into a collection named by the URL of the 
> summary.
> So say I have 10,000 XML documents   referenced by doc("/logs/1.xml") If I 
> choose to load them, they will end up in collection
> "/logs/1.xml".   These summaries are in the collection say "/summaries"
> 
> The reason for this is for the ability to easily bulk delete blocks of 
> documents based on their summaries.
> I can list the summaries and by a simple
>                 exists( collection( $url) )
> 
> cant tell if any actual log documents have been loaded.
> 
> 
> NOW:  I want to be able to delete all records by summary but only if the 
> documents have been loaded.
> Suppose I had 100k summary URL's I could do
> 
>                 for $url in collection("/summaries")
>                                 if( exists( collection( $url) )  then
>                                                 xdmp:collection-delete($url)
>                                 else ()
> 
> 
> This works and all ... but suppose I want something more efficiient.
> Overall there may be only say 1% of the summary documents actually loaded.  
> Furthermore if there were LOTS of ones loaded the above would timeout.
> 
> So I spawn a thread to delete say [1 to 10] of every summary collection ...
> but say I have 100k collections most of the threads do nothing.
> So I have to revert to the above to first check if the collection has 
> anything before spawning a thread.
> 
> Quesiton:   Is there a cts:search  option which can do a collection query 
> based on the results of the search itself ?
> that is (pseudo code)
> in one cts:search
> 
>     for $c in collection("x")/document-uri(.)
>                 if( exists( collection( $c) )
>                                 return $c
> 
> doing this in FLOWR is very slow ...
> but its what I'm resorting to ....
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ----------------------------------------
> David A. Lee
> Senior Principal Software Engineer
> Epocrates, Inc.
> [email protected]<mailto:[email protected]>
> 812-482-5224
> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: 
> http://developer.marklogic.com/pipermail/general/attachments/20111119/c5d016ed/attachment.html
>  
> 
> ------------------------------
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 
> 
> End of General Digest, Vol 89, Issue 80
> ***************************************
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to