Hi Michael,

thanks for your reply.  It made my day! From > 4secs to < 0.4 in less
than two hours! :-)

On Sat, May 7, 2011 at 18:16, Michael Blakeley <[email protected]> wrote:
> As far as I am aware there is no optimization of XPath expressions across 
> variable bindings. So the existing query isn't using indexed lookups for much 
> of anything, but is evaluating many many in-memory expressions. The evaluator 
> is traversing the entire document structure of everything in 
> $these_agreements for each code value, looking for matching nodes.
>
> There are three basic approaches to query optimization, which can be traded 
> off for specific use cases. We can reduce the expression count; we can 
> improve the use of indexes; we can reduce the number of database round-trips. 
> Let's start by trying to reduce the expression count. You could optimize this 
> a little bit by telling the evaluator that there will be only one match per 
> code, allowing it to stop as soon as it finds the first match.
>
> for $one_jurisdiction_code in $related_jurisdictions
>   (: all agreements between the two jurisdictions :)
>  let $these_agreements := (
>    $the_agreements/eoi:agreement[
>      eoi:jurisdictions/eoi:jurisdiction eq $one_jurisdiction_code] )[1]

I understand what you mean but I have to get all agreement elements.
My return clause is probably misleading as it suggests I'm interested
in only one value which is not the case. I have to inspect all
agreements and find out whether at least one of them has been
"signed", "ratified" or "enforced".

> I'd expect a 50% improvement from that change. You could also eliminate one 
> node-traversal step per code-agreement pair by doing that work up front. I'm 
> not sure how much that will save, but easy so it's worth a try.
>
> let $the_agreements := 
> collection('http://www.eoi-portal.org/agreements')/eoi:agreement
> ...
> for $one_jurisdiction_code in $related_jurisdictions
>   (: all agreements between the two jurisdictions :)
>  let $these_agreements := (
>    $the_agreements[
>      eoi:jurisdictions/eoi:jurisdiction eq $one_jurisdiction_code] )[1]
>
> But I suspect it will be more efficient to repeat the collection call 
> instead. This adds to the number of database round-trips, but should greatly 
> reduce the expression count.
>
> let $collection-name := 'http://www.eoi-portal.org/agreements'
> for $one_jurisdiction_code in $related_jurisdictions
>   (: all agreements between the two jurisdictions :)
>  let $these_agreements := collection($collection-name)/eoi:agreement[
>    eoi:jurisdictions/eoi:jurisdiction eq $one_jurisdiction_code ]
> ...
>
> This will result in one call to collection() per code, but each call will use 
> an indexed lookup on the code. So I suspect it will be more efficient than 
> filtering all the agreements in memory for every code. If not, you could try 
> using single collection call to pre-calculate a map, using the codes as map 
> keys.

That did the trick! Using only this made the query ten times faster!
Actually, analysing it a bit closer, I also replaced the * with
eoi:agreement which also helped.


> Note that you don't need those .../text() steps. See 
> http://blakeley.com/wordpress/archives/518 for some discussion of that idiom.

Thanks for reminding me of your article which I read a couple of
months ago and which was very instructive.

boolean($these_agreements/eoi:enforced/text())

In this case, if all eoi:enforced elements are empty this will return
false, if there is at least one which contains a date it will return
true.  I guess, the following expression would be equivalent but
easier to understand?

boolean($these_agreements/eoi:enforced[text()])

> -- Mike

Thank you very much.
Jakob.
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to