Paul,

The best way to understand the performance of an absolute XPath expression is to trace its evaluation:

xdmp:query-trace(true()),
//*...@id eq 'a1234']

In ErrorLog.txt I see:

2009-03-02 12:58:21.340 Info: Docs: line 2: Analyzing path: collection()/descendant::*...@id eq "a1234"] 2009-03-02 12:58:21.340 Info: Docs: line 2: Step 1 is searchable: collection() 2009-03-02 12:58:21.340 Info: Docs: line 2: Step 2 predicate 1 is conditionally searchable: @id eq "a1234" 2009-03-02 12:58:21.340 Info: Docs: line 2: Step 2 is conditionally searchable:descendant::*...@id eq "a1234"] 2009-03-02 12:58:21.340 Info: Docs: line 2: First step of path is searchable: collection()
2009-03-02 12:58:21.340 Info: Docs: line 2: Gathering constraints.
2009-03-02 12:58:21.340 Info: Docs: line 2: Executing search.
2009-03-02 12:58:21.390 Info: Docs: line 2: Selected 30017 fragments to filter

The crucial step is "Gathering constrains": none were found. So the server has to scan all the fragments in the database (in my case, 30,017 of them) to find the matching fragments (0, in my case). This could drive a fair amount of disk I/O.

This happens because the server indexes attributes as element-attribute-value tuples. For best performance we should query that way, too. In most situations it is straightforward to enumerate all the possible parent elements:

  //(a|b|c|d)[...@id eq 'a1234']

Now the query trace shows the constraints being used, and performance is much better:

...
2009-03-02 13:00:41.174 Info: Docs: line 2: Comparison contributed hash value constraint: a/@id = "a1234"
...
2009-03-02 13:00:41.181 Info: Docs: line 2: Comparison contributed hash value constraint: d/@id = "a1234"
...

-- Mike

ps - It's generally considered polite to start a new thread for a new subject, or at least change the subject line (http://www.faqs.org/rfcs/rfc1855.html).

On 2009-03-02 11:26, Paul Vanderveen wrote:
I have an XPath/XQuery to find links in a set of documents that looks like
this:

//*...@id='A123456']

I am searching on somewhere around 15,000 documents, and this seems to take
several seconds to execute.   I was wondering if there is a way to index the
ID attribute so that this can be accomplished much faster, or if there is a
better way to find the element that matches a specified ID.


Paul Vanderveen


_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to