On Jul 18, 2007, at 2:11 AM, Joe Orton wrote:

- it is convention on all modern Unixes I'm aware of that filename
charset/encoding follows LC_CTYPE; not just Linux.  It may derive from
Solaris, I think that's where the locale APIs originate.

I guess I don't know how that works in practice. When you have an encoded string, you need to know it's encoding. On a file system, there is no meta data (typically) to indicate the encoding of the file name string.

So I set my locale settings to correspond to encoding A and write a file. Yours is encoding B. On Linux, one expects the file name to display differently for the other user?

What we do is expect the application to translate as appropriate, so that both users see the same string regardless of the locale settings. (Note that in Darwin, I don't actually think that most command line applications work well with locales, so I'm referring mostly to GUI apps.) So in my example, even though your encoding and mine are different, it's UTF-8 on disk, and the appropriate encoding when displayed.

- AFAIK this convention is not standardised anywhere.

It should at least be documented; word-of-mouth is a poor way to apply convention. But that's neither here nor there.

- Linux-the-kernel is no different from any other Unix kernel in this
respect; it doesn't care about filename charset/encoding and doesn't set
policy for userspace.  Many Linux distributions set up UTF-8 locales
(via $LANG etc) by default, and expect applications to follow the
convention.

- if Darwin has a configurable locale, does *not* set this up by default such that nl_langinfo(CODESET) returns UTF-8, but does by policy require
filenames in UTF-8, regardless of locale, I would agree with changing
apr_filepath_encoding as Erik proposed.  That is the case?

I don't know what the BSD locale system (nl_langinfo , whatever) does in Darwin; I've never worked with it. I only know that for file names, we tell developers to use UTF-8.

        -wsv

Reply via email to