Stress marks (UTF8 712 and 716) seem to be treated as word-separators for the purposes of tokenization. This makes it impossible to search for words containing them (without actually entering the stress marks in the query).
Is there any way to avoid this? Ie to generate indexes that act as if these characters were simply not present? Suppose we were to wrap these characters in an element of some sort - could we cause text on either side of the element to be merged into a single token (as with phrase-around)? -Mike _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
