As far as I am aware there is no optimization of XPath expressions across
variable bindings. So the existing query isn't using indexed lookups for much
of anything, but is evaluating many many in-memory expressions. The evaluator
is traversing the entire document structure of everything in $these_agreements
for each code value, looking for matching nodes.
There are three basic approaches to query optimization, which can be traded off
for specific use cases. We can reduce the expression count; we can improve the
use of indexes; we can reduce the number of database round-trips. Let's start
by trying to reduce the expression count. You could optimize this a little bit
by telling the evaluator that there will be only one match per code, allowing
it to stop as soon as it finds the first match.
for $one_jurisdiction_code in $related_jurisdictions
(: all agreements between the two jurisdictions :)
let $these_agreements := (
$the_agreements/eoi:agreement[
eoi:jurisdictions/eoi:jurisdiction eq $one_jurisdiction_code] )[1]
I'd expect a 50% improvement from that change. You could also eliminate one
node-traversal step per code-agreement pair by doing that work up front. I'm
not sure how much that will save, but easy so it's worth a try.
let $the_agreements :=
collection('http://www.eoi-portal.org/agreements')/eoi:agreement
...
for $one_jurisdiction_code in $related_jurisdictions
(: all agreements between the two jurisdictions :)
let $these_agreements := (
$the_agreements[
eoi:jurisdictions/eoi:jurisdiction eq $one_jurisdiction_code] )[1]
But I suspect it will be more efficient to repeat the collection call instead.
This adds to the number of database round-trips, but should greatly reduce the
expression count.
let $collection-name := 'http://www.eoi-portal.org/agreements'
for $one_jurisdiction_code in $related_jurisdictions
(: all agreements between the two jurisdictions :)
let $these_agreements := collection($collection-name)/eoi:agreement[
eoi:jurisdictions/eoi:jurisdiction eq $one_jurisdiction_code ]
...
This will result in one call to collection() per code, but each call will use
an indexed lookup on the code. So I suspect it will be more efficient than
filtering all the agreements in memory for every code. If not, you could try
using single collection call to pre-calculate a map, using the codes as map
keys.
Note that you don't need those .../text() steps. See
http://blakeley.com/wordpress/archives/518 for some discussion of that idiom.
After optimizing this code, $related_jurisdictions may turn into a hot-spot. If
so, you could add a range index on eol:jurisdiction and replace the XPath with
a cts:element-values() call.
-- Mike
On 7 May 2011, at 08:12 , Jakob Fix wrote:
> hi, I'm trying to optimize a query that runs several seconds instead
> of subseconds.
> the number of documents in the database is very small (~4000), and the
> documents themselves are in the sub-1K range.
>
> running the profiler on the code below shows that an inordinate amount
> of time is spent in this line
> let $these_agreements :=
> $the_agreements/*[/eoi:jurisdictions/eoi:jurisdiction eq
> $one_jurisdiction_code]
>
> (btw, replacing the * with eoi:agreement which is the only root
> element doesn't change a thing.)
>
> profiler output (first three lines, sorted by shallow us):
>
> .main: 18
> $the_agreements/eoi:agreement[/eoi:jurisdictions/eoi:jurisdiction
> eq $one_jurisdiction_code] 107 71 1802951 98 2482631
> .main: 18 /eoi:jurisdictions/eoi:jurisdiction 335552 23 588809
> 23 588809
> .main: 18 /eoi:jurisdictions/eoi:jurisdiction eq
> $one_jurisdiction_code 335552 3.6 90871 27 679680
>
> i've created an element range index for the eoi:jurisdiction element
> in order to speed things up, but the result is the same. so
> apparently that's not the right thing.
> i would be grateful for a hint as to how make this query quicker.
> thanks, Jakob.
>
>
> xquery version "1.0-ml";
>
> declare namespace eoi = "http://www.oecd.org/eoi";
> import module namespace functx = "http://www.functx.com" at
> "/MarkLogic/functx/functx-1.0-nodoc-2007-01.xqy";
>
> let $the_agreements := collection('http://www.eoi-portal.org/agreements')
> let $code := "DE"
> let $related_jurisdictions := functx:value-except(
> fn:collection('http://www.eoi-portal.org/agreements')
> /*[eoi:jurisdictions/eoi:jurisdiction eq
> $code]/eoi:jurisdictions/eoi:jurisdiction,
> ($code)
> )
>
> for $one_jurisdiction_code in $related_jurisdictions
> (: all agreements between the two jurisdictions :)
> let $these_agreements :=
> $the_agreements/*[/eoi:jurisdictions/eoi:jurisdiction eq
> $one_jurisdiction_code]
>
> let $is_inforce := boolean($these_agreements/eoi:enforced/text())
> let $is_ratified := boolean($these_agreements/eoi:ratified/text())
> let $is_signed := boolean($these_agreements/eoi:signed/text())
>
> return (
> $one_jurisdiction_code,
> $is_inforce,
> $is_ratified,
> $is_signed
> )
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general