On 28 July 2010 22:15, Mike Sokolov <[email protected]> wrote: > Stress marks (UTF8 712 and 716) seem to be treated as word-separators > for the purposes of tokenization. This makes it impossible to search > for words containing them (without actually entering the stress marks in > the query). > > Is there any way to avoid this? Ie to generate indexes that act as if > these characters were simply not present? > > Suppose we were to wrap these characters in an element of some sort - > could we cause text on either side of the element to be merged into a > single token (as with phrase-around)?
Seems more like a kludge than a solution Mike? Is there no way to write the combination as a single codepoint? This seems like a character level issue rather than markup? -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. http://www.dpawson.co.uk _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
