[ 
https://issues.apache.org/jira/browse/LUCENE-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe resolved LUCENE-3983.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 4.0
    Lucene Fields: New,Patch Available  (was: New)

Committed to trunk.

I don't think it's worth it to backport to the 3.6 branch, since the only 
danger here was if the set of recognized uppercase variants of HTML character 
entities ever grew, one of them might contain an "i"; since branch 3.6 is 
bugfix-only, though, that set will never grow.
                
> HTMLCharacterEntities.jflex uses String.toUpperCase without Locale
> ------------------------------------------------------------------
>
>                 Key: LUCENE-3983
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3983
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Uwe Schindler
>            Assignee: Steven Rowe
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: LUCENE-3983.patch
>
>
> Is this expected?
> {code:java}
>       "xi", "\u03BE", "yacute", "\u00FD", "yen", "\u00A5", "yuml", "\u00FF",
>       "zeta", "\u03B6", "zwj", "\u200D", "zwnj", "\u200C"
>     };
>     for (int i = 0 ; i < entities.length ; i += 2) {
>       Character value = entities[i + 1].charAt(0);
>       entityValues.put(entities[i], value);
>       if (upperCaseVariantsAccepted.contains(entities[i])) {
>         entityValues.put(entities[i].toUpperCase(), value);
>       }
>     }
> {code}
> In my opinion, this should look like:
> {code:java}
>       "xi", "\u03BE", "yacute", "\u00FD", "yen", "\u00A5", "yuml", "\u00FF",
>       "zeta", "\u03B6", "zwj", "\u200D", "zwnj", "\u200C"
>     };
>     for (int i = 0 ; i < entities.length ; i += 2) {
>       Character value = entities[i + 1].charAt(0);
>       entityValues.put(entities[i], value);
>       if (upperCaseVariantsAccepted.contains(entities[i])) {
>         entityValues.put(entities[i].toUpperCase(Locale.ENGLISH), value);
>       }
>     }
> {code}
> (otherwise in the Turkish locale, the entities containing "i" (like "xi" -> 
> '\u03BE') will not be detected correctly).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to