On Jul 18, 2007, at 2:11 AM, Joe Orton wrote:
- it is convention on all modern Unixes I'm aware of that filename
charset/encoding follows LC_CTYPE; not just Linux. It may derive from
Solaris, I think that's where the locale APIs originate.
I guess I don't know how that works in practice. When you have an
encoded string, you need to know it's encoding. On a file system,
there is no meta data (typically) to indicate the encoding of the file
name string.
So I set my locale settings to correspond to encoding A and write a
file. Yours is encoding B. On Linux, one expects the file name to
display differently for the other user?
What we do is expect the application to translate as appropriate,
so that both users see the same string regardless of the locale
settings. (Note that in Darwin, I don't actually think that most
command line applications work well with locales, so I'm referring
mostly to GUI apps.) So in my example, even though your encoding and
mine are different, it's UTF-8 on disk, and the appropriate encoding
when displayed.
- AFAIK this convention is not standardised anywhere.
It should at least be documented; word-of-mouth is a poor way to
apply convention. But that's neither here nor there.
- Linux-the-kernel is no different from any other Unix kernel in this
respect; it doesn't care about filename charset/encoding and doesn't
set
policy for userspace. Many Linux distributions set up UTF-8 locales
(via $LANG etc) by default, and expect applications to follow the
convention.
- if Darwin has a configurable locale, does *not* set this up by
default
such that nl_langinfo(CODESET) returns UTF-8, but does by policy
require
filenames in UTF-8, regardless of locale, I would agree with changing
apr_filepath_encoding as Erik proposed. That is the case?
I don't know what the BSD locale system (nl_langinfo , whatever)
does in Darwin; I've never worked with it. I only know that for file
names, we tell developers to use UTF-8.
-wsv