On Wed, Dec 04, 2002 at 09:51:03AM -0500, Jungshik Shin wrote: > > > On 3 Dec 2002, H. Peter Anvin wrote: > > > By author: Jungshik Shin <[EMAIL PROTECTED]> > > > > The same is true here. Although Unix file system has few > > > restrictions on file/dir names, it needs to have a provision to specify > > > how to deal with multiple representations of equivalent characters. Is > > > there anything mentioned about this in SUS? > > > > Yes. Filenames are byte sequences, period, full stop. Any attempt at > > normalization would violate SUS/POSIX. > > All right. That's what the *current* SUS/POSIX says. However, that > is hardly a solace to a user who'd be puzzled that two visually > identical and cannonically equivalent filenames are treated differently. > For instance, U+00D6(Latin Capital Letter O with diaresis) should look > identical and be treated identically with U+004F foll. by U+0308. That's > what users expect. I don't know what's the best way to resolve > this conflict. It may be time to consider seriously this particular > aspect of SUS/POSIX. I'm wondering how MacOS X (well, it's not 100% > SUS/POSIX compliant, but nonetheless it's Unix) works in this area. It > uses NFD. That is, 'U+00D6' is stored as 'U+004F U+0308' and both are > treated idnetically.
Well, users should not expect these two sequences to be identical, they are not, according to ISO/IEC 10646. Kind regards Keld -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
