On Wed, Dec 04, 2002 at 09:51:03AM -0500, Jungshik Shin wrote:
> 
> 
> On 3 Dec 2002, H. Peter Anvin wrote:
> 
> > By author:    Jungshik Shin <[EMAIL PROTECTED]>
> 
> > >  The same is true here. Although Unix file system has few
> > > restrictions on file/dir names, it needs to have a provision to specify
> > > how to deal with multiple representations of equivalent characters. Is
> > > there anything mentioned about this in SUS?
> >
> > Yes.  Filenames are byte sequences, period, full stop.  Any attempt at
> > normalization would violate SUS/POSIX.
> 
>   All right. That's what the *current* SUS/POSIX says. However, that
> is hardly a solace to a user who'd be puzzled that two visually
> identical and cannonically equivalent filenames are treated differently.
> For instance, U+00D6(Latin Capital Letter O with diaresis) should look
> identical and be treated identically with U+004F foll. by U+0308. That's
> what users expect.  I don't know what's the best way to resolve
> this conflict. It may be time to consider seriously this particular
> aspect of SUS/POSIX.  I'm wondering how MacOS X (well, it's not 100%
> SUS/POSIX compliant, but nonetheless it's Unix) works in this area. It
> uses NFD. That is, 'U+00D6' is stored as 'U+004F U+0308' and both are
> treated idnetically.

Well, users should not expect these two sequences to be identical,
they are not, according to ISO/IEC 10646.

Kind regards
Keld
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to