[jira] Commented: (IO-167) Fix case-insensitive string handling

Benjamin Bentmann (JIRA) Sat, 24 May 2008 10:08:33 -0700

    [ 
https://issues.apache.org/jira/browse/IO-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599611#action_12599611
 ]


Benjamin Bentmann commented on IO-167:
--------------------------------------

bq. I don't believe the FileSystemUtils changes will make any difference to 
their operation
I'm not sure whether you did not read my mentioned mail post or it just wasn't 
clear enough, so I will try to explain again. The correctness of 
{{FileSystemUtils}} depends on its capability to correctly detect the 
underlying OS. This detection is based on recognition of known OS names which - 
for resiliency - is intended to be case-insensitive. If you're familar with the 
Unicode standard, you will remember that character casing for Non-English 
languages is a non-trivial thing. As just one example, the Turkish language 
defines the lower case form of "I" to be "ı" (dotless i). In other words, if a 
JVM runs on the Turkish locale and the system property "os.name" returns 
"IRIX", "UNIX", "MPE/IX" or "SOLARIS", the unpatched {{FileSystemUtils}} will 
not detect the OS. As a consequence, {{freeSpaceOs()}} fails with an exception.

So when you doubt the patch will make a difference to the operation, is that 
because you believe the outlined preconditions will never occur or because an 
exception doesn't make a difference to you?

bq. the package-private IOCase convertCase() method is only used by the 
FilenameUtils's wildcardMatch() method
Just one question for my own understanding: Is {{wildcardMatch()}} meant to be 
platform-dependent? In other words, would it be considered correct for the 
method if a call with argument {{IOCase.INSENSITIVE}} returns different matches 
based on the user's locale?

bq. it seems wrong to me to hard-code English in principle
"believe", "seems"... with all respect, correctness is nothing about a gut 
feeling. I have no problems if somebody proves me wrong, but such a proof must 
be based on specs, APIs or otherwise authorative materials.

>From the API docs for 
>[{{String.toLowerCase()}}|http://java.sun.com/javase/6/docs/api/java/lang/String.html#toLowerCase()]:
bq. To obtain correct results for locale insensitive strings, use 
toLowerCase(Locale.ENGLISH)

I believe that file names should be understood as locale insensitive strings, 
as a matter of interoperability, but that assumption might be wrong.

Using the English locale for the case conversion will not limit the code to 
ASCII characters, if this was your concern. It will merely fix the behavior of 
{{String.to*erCase()}} to platform-independent conversion rules. If you look at 
the source code for {{to*erCase()}} you will notice that is has an {{if}} for 
the languages "tr", "az" and "lt". The selection of Locale.ENGLISH is quite 
arbitrary, Locale.GERMAN or Locale.FRENCH will equally work well, the key point 
is to avoid the {{if}} regardless of the user's locale.

Back to Unicode, case conversions can be defined in terms of isolated 1:1 
character mappings or context-sensitive m:n mappings matching some written 
language. In most cases (e.g. when you don't want to produce text for human 
consumption), Java codes seeks for platform-independence which implies 
locale-independence. Unicode offers this via the 1:1 character mappings, 
available via {{Character.to*erCase()}} and {{String.equalsIgnoreCase()}}. If 
one wants to approximate this behavior using {{String.to*erCase()}}, one must 
lock the locale.

> Fix case-insensitive string handling
> ------------------------------------
>
>                 Key: IO-167
>                 URL: https://issues.apache.org/jira/browse/IO-167
>             Project: Commons IO
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Benjamin Bentmann
>         Attachments: IO-167.patch
>
>
> Case-insensitive operations are currently platform-dependent, please see 
> [Common Bug #3|http://www.nabble.com/Re%3A-Common-Bugs-p14931921s177.html] 
> for details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (IO-167) Fix case-insensitive string handling

Reply via email to