On Wed, 29 Jun 2016 08:06:35 -0700, Wissam Asfahani (TSO GB)
<[email protected]> wrote:
> Good afternoon,
>
> We are having some issues estimating the number of documents when
> performing word queries containing punctuation characters.
>
> I have attached 4 sample documents. When using the below query, the
> estimate returns 3 and the count 1.
>
> Are there any db configuration settings we can use to ensure a more
> accurate estimate result?
>
>
> let $query := cts:word-query("4µ", ("exact"), 2)
>
> return
> (
> xdmp:estimate(cts:search(fn:doc(), $query)),
> fn:count(cts:search(fn:doc(), $query))
> )
>
>
> Wissam Asfahani
> XML Developer
>
Punctuation is not indexed in the word query indexes. An exact
unwildcarded *value* query will consider punctuation, so if you can
arrange things so that you can use a value query, that could be a
solution. If it is just this character and searching for it in this way is
confined to identifiable parts of the document, you could use field
tokenizer overrides to redefine µ as a word or symbol character for that
field. But it looks like it is being classified as a punctuation mark in
error: it should be classified as a letter character anyway since it is
listed as Ll in the Unicode tables.
//Mary
_______________________________________________
General mailing list
[email protected]
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general