Re: [MarkLogic Dev General] cts query options

Mary Holstege Mon, 18 Mar 2013 10:29:34 -0700

On Mon, 18 Mar 2013 09:33:19 -0700, Tim <[email protected]> wrote:

> I’d like to get an idea what criteria folks use for selecting cts search  
> queries associated with cts:search.  My guess is that the optimal  
> queries are cts:element-range-query() and  
> cts:element-attribute-range-query() used in conjunction with range  
> queries.  I haven’t always found these the best choice for cts queries  
> because there are no options for case, punctuation, diacritic, and space  
> insensitivity, but then again I don’t know if that is inherently  
> addressed by the index collation.  In any case I’d like to get a feel  
> for how much faster it is to use the range queries instead of value  
> queries in conjunction with the cts:search query and for that matter  
> when to use one versus the other.
>


Deciding whether to use a value query or a string range query is
less about optimality and more about functionality. What is the
question you are trying to answer?

The fundamental difference is that range queries are about
looking at string values, perhaps in the context of a collation that
knows how to regard certain differences as irrelevant, and value
queries are about comparing token sequences.

Word tokens may be stemmed and punctuation and space tokens
are not indexed.  The index for tokens may contain diacritic and
case variants. None of these things is true of strings in a range
index. That is, a value query can see that <p>He runs.</p>
matches "he ran" but there is no way that could be true in
a range query.

At the boundary, where you specify exact unstemmed value
queries or exact range queries with a codepoint collation,
the results will line up. For exact queries there are universal
index entries for the value that include punctuation and
whitespace, but we don't index those tokens otherwise.

 From a performance standpoint, the fundamental difference
is that value queries pull from whatever terms are available
in the inversal index, and range queries have to scan and
compare strings in the range index.  Something like case
insensitivity could be pre-baked into a range index or it
could be applied as an operation at query time: so the
comparisons need to be done one way or another; it just
depends on when you want to pay for it, and how much
flexibility you want about it at query time.  Things like
case-insensitivity in value queries is baked in at index
time, provided you enable the appropriate index (e.g.
fast case sensitive); otherwise you either have to depend
on the filter or accept inaccurate results.

The other thing to remember is: we are constantly working
on improving query optimization, so any performance-related
advice you get today, could perhaps be false tomorrow.

//Mary


_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] cts query options

Reply via email to