On Aug 6, 2007, at 5:11 PM, Roy T. Fielding wrote:
I agree. But is it the case that non-native mounted filesystemsare name-translated by the kernel? I mean, if OS X did this consistentlyfor all mount points, then I would see it as being reasonable for the OS X applications to reject anything else.
According to the tech note on this, if the encoding for the underlying volume format is known, it should be translated to UTF-8 at the VFS layer by the file system implementation:
http://developer.apple.com/qa/qa2001/qa1173.html
Actually, it also crashes on valid utf-8 in normal form, because OS X doesn't follow the standard on normalization. See "man -s 5 utf8":If more than a single representation of a value exists (for example, 0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always used. Longer ones are detected as an error as they pose a potential security risk, and destroy the 1:1 character:octet sequence mapping.but OS X requires the longer composition characters over shorter ones. My guess is that choice was driven by the way the UI allows such characters to be composed (like "alt-u u" for uumlaut).
Above the VFS layer, we always use decomposed UTF-8.
Of course, even with these issues, the Mac still kicks ass.
Well, that's a given.
Again, same as with volume formats, if the zip file format defines the encoding in zip files, then this should be easy (insofar as encodings are easy) for the software to deal with.Sadly, it doesn't (filenames are just null-terminated strings). There are options for conversion from EBCDIC, but nothing to transcode the filenames in general as they are unzipped. Maybe the zip command maintainer will take that as an enhancement request.
Right, same with all archive formats. You need to either define the name encoding as X or add some metadata to let you specify what encoding is in use (and, ideally, require that this be provided).
You still have to hope that the inbound encoding is correct (that is, that svn somehow knows it). On OS X, that's easy; it's UTF-8. Once other operating systems come into the mix, it'll works as well as the encodings are defined (and known to svn) on those systems.What I do currently is define setenv MM_CHARSET "utf-8" setenv LANG "en_US.utf-8" in my shell init file.
On Mac OS (at least), that isn't relevant with respect to filenames, which is what the patch that Erik proposed fixes.
It is, however, relevant to how a CLI application encodes data sent to the terminal. That is, the above means that Terminal.app expects to see UTF-8 English text. (I think; again, I don't really know much about BSD locale settings.)
-wsv — Wilfredo Sánchez - [EMAIL PROTECTED]
smime.p7s
Description: S/MIME cryptographic signature