Paul,
The best way to understand the performance of an absolute XPath
expression is to trace its evaluation:
xdmp:query-trace(true()),
//*...@id eq 'a1234']
In ErrorLog.txt I see:
2009-03-02 12:58:21.340 Info: Docs: line 2: Analyzing path:
collection()/descendant::*...@id eq "a1234"]
2009-03-02 12:58:21.340 Info: Docs: line 2: Step 1 is searchable:
collection()
2009-03-02 12:58:21.340 Info: Docs: line 2: Step 2 predicate 1 is
conditionally searchable: @id eq "a1234"
2009-03-02 12:58:21.340 Info: Docs: line 2: Step 2 is conditionally
searchable:descendant::*...@id eq "a1234"]
2009-03-02 12:58:21.340 Info: Docs: line 2: First step of path is
searchable: collection()
2009-03-02 12:58:21.340 Info: Docs: line 2: Gathering constraints.
2009-03-02 12:58:21.340 Info: Docs: line 2: Executing search.
2009-03-02 12:58:21.390 Info: Docs: line 2: Selected 30017 fragments to
filter
The crucial step is "Gathering constrains": none were found. So the
server has to scan all the fragments in the database (in my case, 30,017
of them) to find the matching fragments (0, in my case). This could
drive a fair amount of disk I/O.
This happens because the server indexes attributes as
element-attribute-value tuples. For best performance we should query
that way, too. In most situations it is straightforward to enumerate all
the possible parent elements:
//(a|b|c|d)[...@id eq 'a1234']
Now the query trace shows the constraints being used, and performance is
much better:
...
2009-03-02 13:00:41.174 Info: Docs: line 2: Comparison contributed hash
value constraint: a/@id = "a1234"
...
2009-03-02 13:00:41.181 Info: Docs: line 2: Comparison contributed hash
value constraint: d/@id = "a1234"
...
-- Mike
ps - It's generally considered polite to start a new thread for a new
subject, or at least change the subject line
(http://www.faqs.org/rfcs/rfc1855.html).
On 2009-03-02 11:26, Paul Vanderveen wrote:
I have an XPath/XQuery to find links in a set of documents that looks like
this:
//*...@id='A123456']
I am searching on somewhere around 15,000 documents, and this seems to take
several seconds to execute. I was wondering if there is a way to index the
ID attribute so that this can be accomplished much faster, or if there is a
better way to find the element that matches a specified ID.
Paul Vanderveen
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general