Re: [MarkLogic Dev General] cts query options

Tim Mon, 18 Mar 2013 11:42:08 -0700

Hi Mary,

I think you fundamentally answered my questions, i.e., when to choose one 
approach or another.  In my UI I have multiple search options, some which will 
may work better with either range or value queries.  For now I'm using all 
value queries, but was considering implementing range queries for suitable 
candidates.  I'm just not convinced that the performance gain justifies using 
the range queries.  I also wonder how useful it is to use mixed query types in 
a single cts:search combining them via cts:and-query.

Thanks!

Tim

-----Original Message-----
From: Mary Holstege [mailto:[email protected]] 
Sent: Monday, March 18, 2013 1:29 PM
To: 'MarkLogic Developer Discussion'; Tim
Subject: Re: [MarkLogic Dev General] cts query options

On Mon, 18 Mar 2013 09:33:19 -0700, Tim <[email protected]> wrote:

> I d like to get an idea what criteria folks use for selecting cts 
> search queries associated with cts:search.  My guess is that the 
> optimal queries are cts:element-range-query() and
> cts:element-attribute-range-query() used in conjunction with range 
> queries.  I haven t always found these the best choice for cts queries 
> because there are no options for case, punctuation, diacritic, and 
> space insensitivity, but then again I don t know if that is inherently 
> addressed by the index collation.  In any case I d like to get a feel 
> for how much faster it is to use the range queries instead of value 
> queries in conjunction with the cts:search query and for that matter 
> when to use one versus the other.
>

Deciding whether to use a value query or a string range query is less about 
optimality and more about functionality. What is the question you are trying to 
answer?

The fundamental difference is that range queries are about looking at string 
values, perhaps in the context of a collation that knows how to regard certain 
differences as irrelevant, and value queries are about comparing token 
sequences.

Word tokens may be stemmed and punctuation and space tokens are not indexed.  
The index for tokens may contain diacritic and case variants. None of these 
things is true of strings in a range index. That is, a value query can see that 
<p>He runs.</p> matches "he ran" but there is no way that could be true in a 
range query.

At the boundary, where you specify exact unstemmed value queries or exact range 
queries with a codepoint collation, the results will line up. For exact queries 
there are universal index entries for the value that include punctuation and 
whitespace, but we don't index those tokens otherwise.

 From a performance standpoint, the fundamental difference is that value 
queries pull from whatever terms are available in the inversal index, and range 
queries have to scan and compare strings in the range index.  Something like 
case insensitivity could be pre-baked into a range index or it could be applied 
as an operation at query time: so the comparisons need to be done one way or 
another; it just depends on when you want to pay for it, and how much 
flexibility you want about it at query time.  Things like case-insensitivity in 
value queries is baked in at index time, provided you enable the appropriate 
index (e.g.
fast case sensitive); otherwise you either have to depend on the filter or 
accept inaccurate results.

The other thing to remember is: we are constantly working on improving query 
optimization, so any performance-related advice you get today, could perhaps be 
false tomorrow.

//Mary

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] cts query options

Reply via email to