Just to summarize the ins and outs here in one place, because I'm seeing a
certain amount of confusion:
* xdmp:plan is your friend: it will show you the questions we ask the indexes.
If you see some information from your query is not reflected in the plan, that
will be a case where you might get false positives from index resolution (i.e.
unfiltered search).
* Punctuation and space tokens are not indexed as words in the universal index.
Therefore, word queries involving whitespace or punctuation will not make use
of whitespace or punctuation in index resolution, regardless of space or
punctuation sensitivity.
* Punctuation and space tokens are also not generally indexed as words in the
universal index in value queries either. However, as a special exception there
are terms in the universal index for "exact" value queries (unstemmed,
case-sensitive, whitespace-sensitive, punctuation-sensitive), so "exact" value
queries should be resolvable properly from the index, but only if you have
fast-case-sensitive-searches and fast-diacritic-sensitive-searches enabled in
the database.
* For field word or value queries you can modify what counts as punctuation or
whitespace via tokenizer overrides. This can turn what would have been a phrase
into a single word.
* Outside of the special case given for exact value queries, all queries
involving space or punctuation are phrase queries. Word and value search is not
string matching.
* Space- and punctuation-insensitive does not mean tokenization-insensitive.
"foo-bar" will not match "foobar" as a value query or a word query, regardless
of your punctuation sensitivity. Word and value search is not string matching.
* String range queries are about string matching. Whether there is a match
depends on the collation, but there is no tokenization happening, no stemming,
ever.
* If the plan for cts:value-query(xs:QName("x"),"value-1","exact") doesn't
include the hyphen, and you do have fast-case-sensitive-searches and
fast-diacritic-sensitive-searches enabled in the database, that is a bug.
So if you want to do exact queries you can either:
(1) Enable fast-case-sensitive-searches and fast-diacritic-sensitive-searches
on your database and run them as value queries.
OR
(2) Create a field with custom overrides for the significant punctuation or
whitespace and run them as field word or field value queries.
OR
(3) Create a string range index with the appropriate collation (codepoint, most
likely) and run them as string-range equality queries.
Cheers
//Mary
_______________________________________________
General mailing list
[email protected]
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general