Liwei Lin created PARQUET-430:
---------------------------------

             Summary: Change to use Locale parameterized version of 
String.toUpperCase()/toLowerCase
                 Key: PARQUET-430
                 URL: https://issues.apache.org/jira/browse/PARQUET-430
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
    Affects Versions: 1.8.0, 1.8.1
            Reporter: Liwei Lin
            Assignee: Liwei Lin
            Priority: Minor
             Fix For: 1.9.0


A String is being converted to upper or lowercase, using the platform's default 
encoding. This may result in improper conversions when used with international 
characters.

For instance, "TITLE".toLowerCase() in a Turkish locale returns "tıtle", where 
'ı' -- without a dot -- is the LATIN SMALL LETTER DOTLESS I character. To 
obtain correct results for locale insensitive strings, we'd better use 
toLowerCase(Locale.ENGLISH).

For more information on this, please see:
- 
http://stackoverflow.com/questions/11063102/using-locales-with-javas-tolowercase-and-touppercase
- 
http://lotusnotus.com/lotusnotus_en.nsf/dx/dotless-i-tolowercase-and-touppercase-functions-use-responsibly.htm
- http://java.sys-con.com/node/46241

This ticket proposes to change our use of String.toUpperCase()/toLowerCase() to 
String.toUpperCase(Locale.*ENGLISH*)/toLowerCase(*Locale.ENGLISH*)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to