[
https://issues.apache.org/jira/browse/IO-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599611#action_12599611
]
Benjamin Bentmann commented on IO-167:
--------------------------------------
bq. I don't believe the FileSystemUtils changes will make any difference to
their operation
I'm not sure whether you did not read my mentioned mail post or it just wasn't
clear enough, so I will try to explain again. The correctness of
{{FileSystemUtils}} depends on its capability to correctly detect the
underlying OS. This detection is based on recognition of known OS names which -
for resiliency - is intended to be case-insensitive. If you're familar with the
Unicode standard, you will remember that character casing for Non-English
languages is a non-trivial thing. As just one example, the Turkish language
defines the lower case form of "I" to be "ı" (dotless i). In other words, if a
JVM runs on the Turkish locale and the system property "os.name" returns
"IRIX", "UNIX", "MPE/IX" or "SOLARIS", the unpatched {{FileSystemUtils}} will
not detect the OS. As a consequence, {{freeSpaceOs()}} fails with an exception.
So when you doubt the patch will make a difference to the operation, is that
because you believe the outlined preconditions will never occur or because an
exception doesn't make a difference to you?
bq. the package-private IOCase convertCase() method is only used by the
FilenameUtils's wildcardMatch() method
Just one question for my own understanding: Is {{wildcardMatch()}} meant to be
platform-dependent? In other words, would it be considered correct for the
method if a call with argument {{IOCase.INSENSITIVE}} returns different matches
based on the user's locale?
bq. it seems wrong to me to hard-code English in principle
"believe", "seems"... with all respect, correctness is nothing about a gut
feeling. I have no problems if somebody proves me wrong, but such a proof must
be based on specs, APIs or otherwise authorative materials.
>From the API docs for
>[{{String.toLowerCase()}}|http://java.sun.com/javase/6/docs/api/java/lang/String.html#toLowerCase()]:
bq. To obtain correct results for locale insensitive strings, use
toLowerCase(Locale.ENGLISH)
I believe that file names should be understood as locale insensitive strings,
as a matter of interoperability, but that assumption might be wrong.
Using the English locale for the case conversion will not limit the code to
ASCII characters, if this was your concern. It will merely fix the behavior of
{{String.to*erCase()}} to platform-independent conversion rules. If you look at
the source code for {{to*erCase()}} you will notice that is has an {{if}} for
the languages "tr", "az" and "lt". The selection of Locale.ENGLISH is quite
arbitrary, Locale.GERMAN or Locale.FRENCH will equally work well, the key point
is to avoid the {{if}} regardless of the user's locale.
Back to Unicode, case conversions can be defined in terms of isolated 1:1
character mappings or context-sensitive m:n mappings matching some written
language. In most cases (e.g. when you don't want to produce text for human
consumption), Java codes seeks for platform-independence which implies
locale-independence. Unicode offers this via the 1:1 character mappings,
available via {{Character.to*erCase()}} and {{String.equalsIgnoreCase()}}. If
one wants to approximate this behavior using {{String.to*erCase()}}, one must
lock the locale.
> Fix case-insensitive string handling
> ------------------------------------
>
> Key: IO-167
> URL: https://issues.apache.org/jira/browse/IO-167
> Project: Commons IO
> Issue Type: Bug
> Affects Versions: 1.4
> Reporter: Benjamin Bentmann
> Attachments: IO-167.patch
>
>
> Case-insensitive operations are currently platform-dependent, please see
> [Common Bug #3|http://www.nabble.com/Re%3A-Common-Bugs-p14931921s177.html]
> for details.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.