[ https://issues.apache.org/jira/browse/LUCENE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482923 ]
Hoss Man commented on LUCENE-841: --------------------------------- there are lots of OSes and editors where changing the file encoding is somewhat hard .. particularly if you have reasons why other files need to be in ASCII to deal with other systems. It's a trade off, people with UTF-8 capable environments would probably rather see the real character, while people still using ascii would probably rather see \uXXXX ... i would think the \xXXXX approach is the most universally functional, since anyone can lookup a character from it's character code, but people looking at funky control characters can't always tell what character code it is. (I wonder if there is an fast/easy way to get a char from a Unicode Character name?) > Replace UTF8 characters in stemmer code with integer values. > ------------------------------------------------------------ > > Key: LUCENE-841 > URL: https://issues.apache.org/jira/browse/LUCENE-841 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Karl Wettin > Priority: Critical > > BrazillianStemmer, GermanStemmer, FrenchStemmer and DutchStemmer all contains > UTF characters in the java code. All environments does not handle that. It > really ought to be integer values instead. > I'll come up with a patch sooner or later. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]