It seems like those use cases could be implemented more efficiently without 
uris-match. The first one could be done with cts:uris and cts:directory-query. 
The second could use exists(doc($uri)), or cts:uris with cts:directory-query.

Depending on the work uris-match decides to do, it may need to scan the entire 
uri lexicon for matches. That's O(n) with the number of URIs, probably 
something like 1M/sec.

-- Mike

> On Oct 23, 2014, at 01:05, Rachel Wilson <[email protected]> wrote:
> 
> Hi,
> 
> I was wondering if anyone had a reply to this.  
> 
> We're digging even deeper into improving our performance for an API and in 
> several places (because we use it liberally) cts:uri-match ends up being the 
> bottleneck.  We are happy to redesign our data and queries where we can to 
> avoid it, but it continues to surprise us that this is the case because we 
> thought the uris are indexed and the function is designed to use wildcards 
> because it's a matcher.
> 
> A typical call would be
> 
>    let $uris := cts:uri-match("/project/" || $projectId ||"/jobs/*",
> 
> But we're most surprised by this one, we used as a test, because there aren't 
> even any wildcards.
> 
>    let $thereShouldBeOnlyOne := cts:uri-match("/project/" || $projectId || 
> "/content/" || $contentId)
> 
> Some insight into the inner workings of that function would be great
> 
> 
> From: Rachel Wilson <[email protected]>
> Date: Thursday, 16 October 2014 17:25
> To: MarkLogic Developer Discussion <[email protected]>
> Subject: Surprising slowness of cts:uri-match
> 
> In our experience cts:uri-match is surprisingly slow.  For example when 
> profiling a pretty complicated query taking 0.7 seconds, the single 
> cts:uri-match() call takes 70-80% of the total time.  (Shallow% and Deep% 
> being the same)
> 
> But we thought it should be reading the URI lexicon and so in a database with 
> only 483,475 docs should be lightening fast.   We've had to stop using 
> cts:uri-match calls in loops for this reason.
> 
> Are there any match patterns to be avoided perhaps?  Wildcards in the middle 
> of the pattern, rather than trailing wildcards for example?
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to