[ 
https://issues.apache.org/jira/browse/LUCENE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482923
 ] 

Hoss Man commented on LUCENE-841:
---------------------------------

there are lots of OSes and editors where changing the file encoding is somewhat 
hard .. particularly if you have reasons why other files need to be in ASCII to 
deal with other systems.

It's a trade off, people with UTF-8 capable environments would probably rather 
see the real character, while people still using ascii would probably rather 
see \uXXXX ... i would think the \xXXXX approach is the most universally 
functional, since anyone can lookup a character from it's character code, but 
people looking at funky control characters can't always tell what character 
code it is.

(I wonder if there is an fast/easy way to get a char from a Unicode Character 
name?)

> Replace UTF8 characters in stemmer code with integer values.
> ------------------------------------------------------------
>
>                 Key: LUCENE-841
>                 URL: https://issues.apache.org/jira/browse/LUCENE-841
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Karl Wettin
>            Priority: Critical
>
> BrazillianStemmer, GermanStemmer, FrenchStemmer and DutchStemmer all contains 
> UTF characters in the java code. All environments does not handle that. It 
> really ought to be integer values instead.
> I'll come up with a patch sooner or later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to