Re: [MarkLogic Dev General] hyphens and cts:element-value-query

Mary Holstege Tue, 28 Feb 2017 13:53:08 -0800

Just to summarize the ins and outs here in one place, because I'm seeing a 
certain amount of confusion:


* xdmp:plan is your friend: it will show you the questions we ask the indexes. 
If you see some information from your query is not reflected in the plan, that 
will be a case where you might get false positives from index resolution (i.e. 
unfiltered search).

* Punctuation and space tokens are not indexed as words in the universal index. 
Therefore, word queries involving whitespace or punctuation will not make use 
of whitespace or punctuation in index resolution, regardless of space or 
punctuation sensitivity.

* Punctuation and space tokens are also not generally indexed as words in the 
universal index in value queries either. However, as a special exception there 
are terms in the universal index for "exact" value queries (unstemmed, 
case-sensitive, whitespace-sensitive, punctuation-sensitive), so "exact" value 
queries should be resolvable properly from the index, but only if you have 
fast-case-sensitive-searches and fast-diacritic-sensitive-searches enabled in 
the database.

* For field word or value queries you can modify what counts as punctuation or 
whitespace via tokenizer overrides. This can turn what would have been a phrase 
into a single word.

* Outside of the special case given for exact value queries, all queries 
involving space or punctuation are phrase queries. Word and value search is not 
string matching.

* Space- and punctuation-insensitive does not mean tokenization-insensitive. 
"foo-bar" will not match "foobar" as a value query or a word query, regardless 
of your punctuation sensitivity. Word and value search is not string matching.

* String range queries are about string matching. Whether there is a match 
depends on the collation, but there is no tokenization happening, no stemming, 
ever.

* If the plan for cts:value-query(xs:QName("x"),"value-1","exact") doesn't 
include the hyphen, and you do have fast-case-sensitive-searches and 
fast-diacritic-sensitive-searches enabled in the database, that is a bug.

So if you want to do exact queries you can either:
(1) Enable fast-case-sensitive-searches and fast-diacritic-sensitive-searches 
on your database and run them as value queries.
OR
(2) Create a field with custom overrides for the significant punctuation or 
whitespace and run them as field word or field value queries.
OR
(3) Create a string range index with the appropriate collation (codepoint, most 
likely) and run them as string-range equality queries.

Cheers

//Mary

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] hyphens and cts:element-value-query

Reply via email to