[MarkLogic Dev General] XPath performance with attribute lookup [ was Re: [MarkLogic Dev General] ReIndexing takes too long ]

Michael Blakeley Mon, 02 Mar 2009 13:20:21 -0800

Paul,

The best way to understand the performance of an absolute XPathexpression is to trace its evaluation:


xdmp:query-trace(true()),
//*...@id eq 'a1234']

In ErrorLog.txt I see:

2009-03-02 12:58:21.340 Info: Docs: line 2: Analyzing path:collection()/descendant::*...@id eq "a1234"]2009-03-02 12:58:21.340 Info: Docs: line 2: Step 1 is searchable:collection()2009-03-02 12:58:21.340 Info: Docs: line 2: Step 2 predicate 1 isconditionally searchable: @id eq "a1234"2009-03-02 12:58:21.340 Info: Docs: line 2: Step 2 is conditionallysearchable:descendant::*...@id eq "a1234"]2009-03-02 12:58:21.340 Info: Docs: line 2: First step of path issearchable: collection()

2009-03-02 12:58:21.340 Info: Docs: line 2: Gathering constraints.
2009-03-02 12:58:21.340 Info: Docs: line 2: Executing search.

2009-03-02 12:58:21.390 Info: Docs: line 2: Selected 30017 fragments tofilter

The crucial step is "Gathering constrains": none were found. So theserver has to scan all the fragments in the database (in my case, 30,017of them) to find the matching fragments (0, in my case). This coulddrive a fair amount of disk I/O.

This happens because the server indexes attributes aselement-attribute-value tuples. For best performance we should querythat way, too. In most situations it is straightforward to enumerate allthe possible parent elements:


  //(a|b|c|d)[...@id eq 'a1234']

Now the query trace shows the constraints being used, and performance ismuch better:

...

2009-03-02 13:00:41.174 Info: Docs: line 2: Comparison contributed hashvalue constraint: a/@id = "a1234"

...

2009-03-02 13:00:41.181 Info: Docs: line 2: Comparison contributed hashvalue constraint: d/@id = "a1234"

...

-- Mike

ps - It's generally considered polite to start a new thread for a newsubject, or at least change the subject line(http://www.faqs.org/rfcs/rfc1855.html).


On 2009-03-02 11:26, Paul Vanderveen wrote:

I have an XPath/XQuery to find links in a set of documents that looks like
this:

//*...@id='A123456']

I am searching on somewhere around 15,000 documents, and this seems to take
several seconds to execute.   I was wondering if there is a way to index the
ID attribute so that this can be accomplished much faster, or if there is a
better way to find the element that matches a specified ID.


Paul Vanderveen


_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general


_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

[MarkLogic Dev General] XPath performance with attribute lookup [ was Re: [MarkLogic Dev General] ReIndexing takes too long ]

Reply via email to