As far as I am aware there is no optimization of XPath expressions across 
variable bindings. So the existing query isn't using indexed lookups for much 
of anything, but is evaluating many many in-memory expressions. The evaluator 
is traversing the entire document structure of everything in $these_agreements 
for each code value, looking for matching nodes.

There are three basic approaches to query optimization, which can be traded off 
for specific use cases. We can reduce the expression count; we can improve the 
use of indexes; we can reduce the number of database round-trips. Let's start 
by trying to reduce the expression count. You could optimize this a little bit 
by telling the evaluator that there will be only one match per code, allowing 
it to stop as soon as it finds the first match.

for $one_jurisdiction_code in $related_jurisdictions
   (: all agreements between the two jurisdictions :)
  let $these_agreements := (
    $the_agreements/eoi:agreement[
      eoi:jurisdictions/eoi:jurisdiction eq $one_jurisdiction_code] )[1]

I'd expect a 50% improvement from that change. You could also eliminate one 
node-traversal step per code-agreement pair by doing that work up front. I'm 
not sure how much that will save, but easy so it's worth a try.

let $the_agreements := 
collection('http://www.eoi-portal.org/agreements')/eoi:agreement
...
for $one_jurisdiction_code in $related_jurisdictions
   (: all agreements between the two jurisdictions :)
  let $these_agreements := (
    $the_agreements[
      eoi:jurisdictions/eoi:jurisdiction eq $one_jurisdiction_code] )[1]

But I suspect it will be more efficient to repeat the collection call instead. 
This adds to the number of database round-trips, but should greatly reduce the 
expression count.

let $collection-name := 'http://www.eoi-portal.org/agreements'
for $one_jurisdiction_code in $related_jurisdictions
   (: all agreements between the two jurisdictions :)
  let $these_agreements := collection($collection-name)/eoi:agreement[
    eoi:jurisdictions/eoi:jurisdiction eq $one_jurisdiction_code ]
...

This will result in one call to collection() per code, but each call will use 
an indexed lookup on the code. So I suspect it will be more efficient than 
filtering all the agreements in memory for every code. If not, you could try 
using single collection call to pre-calculate a map, using the codes as map 
keys.

Note that you don't need those .../text() steps. See 
http://blakeley.com/wordpress/archives/518 for some discussion of that idiom.

After optimizing this code, $related_jurisdictions may turn into a hot-spot. If 
so, you could add a range index on eol:jurisdiction and replace the XPath with 
a cts:element-values() call.

-- Mike

On 7 May 2011, at 08:12 , Jakob Fix wrote:

> hi, I'm trying to optimize a query that runs several seconds instead
> of subseconds.
> the number of documents in the database is very small (~4000), and the
> documents themselves are in the sub-1K range.
> 
> running the profiler on the code below shows that an inordinate amount
> of time is spent in this line
>   let $these_agreements :=
> $the_agreements/*[/eoi:jurisdictions/eoi:jurisdiction eq
> $one_jurisdiction_code]
> 
> (btw, replacing the * with eoi:agreement which is the only root
> element doesn't change a thing.)
> 
> profiler output (first three lines, sorted by shallow us):
> 
> .main:  18    
> $the_agreements/eoi:agreement[/eoi:jurisdictions/eoi:jurisdiction
> eq $one_jurisdiction_code]    107     71      1802951 98      2482631
> .main:  18    /eoi:jurisdictions/eoi:jurisdiction     335552  23      588809  
> 23      588809
> .main:  18    /eoi:jurisdictions/eoi:jurisdiction eq
> $one_jurisdiction_code        335552  3.6     90871   27      679680
> 
> i've created an element range index for the eoi:jurisdiction element
> in order to speed things up, but the result is the same.  so
> apparently that's not the right thing.
> i would be grateful for a hint as to how make this query quicker.
> thanks, Jakob.
> 
> 
> xquery version "1.0-ml";
> 
> declare namespace eoi = "http://www.oecd.org/eoi";;
> import module namespace functx = "http://www.functx.com"; at
> "/MarkLogic/functx/functx-1.0-nodoc-2007-01.xqy";
> 
> let $the_agreements := collection('http://www.eoi-portal.org/agreements')
> let $code := "DE"
> let $related_jurisdictions := functx:value-except(
>    fn:collection('http://www.eoi-portal.org/agreements')
>        /*[eoi:jurisdictions/eoi:jurisdiction eq
> $code]/eoi:jurisdictions/eoi:jurisdiction,
>        ($code)
>    )
> 
> for $one_jurisdiction_code in $related_jurisdictions
>    (: all agreements between the two jurisdictions :)
>   let $these_agreements :=
> $the_agreements/*[/eoi:jurisdictions/eoi:jurisdiction eq
> $one_jurisdiction_code]
> 
>   let $is_inforce := boolean($these_agreements/eoi:enforced/text())
>   let $is_ratified := boolean($these_agreements/eoi:ratified/text())
>   let $is_signed := boolean($these_agreements/eoi:signed/text())
> 
> return (
>    $one_jurisdiction_code,
>    $is_inforce,
>    $is_ratified,
>    $is_signed
> )
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to