Thanks Jason, adding the word and element position indexes dropped this
query down to about 500ms.
Thanks!


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Jason
Hunter
Sent: Monday, December 21, 2009 12:43 AM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] Help with slow query ....

On Dec 20, 2009, at 8:32 PM, Lee, David wrote:

> I've run into an interesting case of a very slow query.
> In the DB I have I have about 600,000 fragments (in about 6000 files).
> These are small fragments (about 300 bytes) with about 10 very short
elements containing a short string or nothing.
> In MOST searches I get  about 100ms result times but this one takes
about 60 seconds

You can catch the query-trace of it to see how (and how well) the
filtering is being applied.

> cts:search(
> xdmp:directory("/RxNorm/rxnconso/")//RXNCONSO ,
>     cts:element-query( xs:QName("STR") ,
>        cts:word-query( "ENG",  ("case-insensitive",
"diacritic-sensitive",
>           "punctuation-insensitive", "whitespace-insensitive",
"unstemmed","wildcarded") ) ) )[1 to 10]
>  
> What I think is going on here is that the term "ENG" is in every
single fragment (its a language code), so its finding 600,000 fragments
> but I'm constructing a search to limit the search to only "STR"
elements, of which none contain "ENG".
> My guess as to what is happening is that ML is  finding a "hit" in
every fragment, but has to open up the fragment
> and search to discover that the hit was in the wrong element.   The
result is the empty sequence.
> but it takes a minute to get to that.

The docs on cts:element-query() explain which indexes can help it do its
job:

"Enabling both the word position and element position indexes ("word
position" and "element word position" in the database configuration
screen of the Admin Interface) will speed up query performance for many
queries that use cts:element-query. The position indexes enable
MarkLogic Server to eliminate many false-positive results, which can
reduce disk I/O and processing, thereby speeding the performance of many
queries. The amount of benefit will vary depending on your data."

Sounds like that's the most likely candidate; you don't have these
indexes so the query is seeing those false positives that appear in
other elements.

-jh-

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to