On Mon, 18 Mar 2013 09:33:19 -0700, Tim <[email protected]> wrote: > I’d like to get an idea what criteria folks use for selecting cts search > queries associated with cts:search. My guess is that the optimal > queries are cts:element-range-query() and > cts:element-attribute-range-query() used in conjunction with range > queries. I haven’t always found these the best choice for cts queries > because there are no options for case, punctuation, diacritic, and space > insensitivity, but then again I don’t know if that is inherently > addressed by the index collation. In any case I’d like to get a feel > for how much faster it is to use the range queries instead of value > queries in conjunction with the cts:search query and for that matter > when to use one versus the other. >
Deciding whether to use a value query or a string range query is less about optimality and more about functionality. What is the question you are trying to answer? The fundamental difference is that range queries are about looking at string values, perhaps in the context of a collation that knows how to regard certain differences as irrelevant, and value queries are about comparing token sequences. Word tokens may be stemmed and punctuation and space tokens are not indexed. The index for tokens may contain diacritic and case variants. None of these things is true of strings in a range index. That is, a value query can see that <p>He runs.</p> matches "he ran" but there is no way that could be true in a range query. At the boundary, where you specify exact unstemmed value queries or exact range queries with a codepoint collation, the results will line up. For exact queries there are universal index entries for the value that include punctuation and whitespace, but we don't index those tokens otherwise. From a performance standpoint, the fundamental difference is that value queries pull from whatever terms are available in the inversal index, and range queries have to scan and compare strings in the range index. Something like case insensitivity could be pre-baked into a range index or it could be applied as an operation at query time: so the comparisons need to be done one way or another; it just depends on when you want to pay for it, and how much flexibility you want about it at query time. Things like case-insensitivity in value queries is baked in at index time, provided you enable the appropriate index (e.g. fast case sensitive); otherwise you either have to depend on the filter or accept inaccurate results. The other thing to remember is: we are constantly working on improving query optimization, so any performance-related advice you get today, could perhaps be false tomorrow. //Mary _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
