Hi,

While this thread has run away into the display and accessing
of FILENAMES that happen to occur in C programs (or elsewhere),
the IDENTIFIER normalization issue is more profound in C programs.

If the C compiler does NOT do some consistent normalization, then
the actual IDENTIFIER that the linker tries to resolve will not
match and the link will fail.

To assume that some keyboard input method does the normalization
is naive.  Lots of C programs are machine-generated (more and more
in these days of UML -> WSDL -> whatever language tools).

And while NFC (Canonical Decomposition, followed by Canonical
Composition) seems obvious, for an IDENTIFIER, it may well be
that NFKC (Compatibility Decomposition, followed by Canonical
Composition) becomes more desirable, because it "reduces" the 
visual look-alikes.

Cheers,
- Ira McDonald
  High North Inc



-----Original Message-----
From: H. Peter Anvin [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, December 04, 2002 1:43 PM
To: [EMAIL PROTECTED]
Subject: Re: filename and normalization (was gcc identifiers)


Followup to:  <[EMAIL PROTECTED]>
By author:    Glenn Maynard <[EMAIL PROTECTED]>
In newsgroup: linux.utf8
>
> On Wed, Dec 04, 2002 at 04:03:38PM +0100, Keld J�rn Simonsen wrote:
> > Well, users should not expect these two sequences to be identical,
> > they are not, according to ISO/IEC 10646.
> 
> Users expect that "�" == "�", and don't know or care about Unicode, and
> that's reasonable.
> 
> Programmers should care, of course, but programmers aren't the only ones
> who use filenames, and this problem, as Henry pointed out, is a more
> general one.
> 

The issue is where the normalization enters the picture.  It should be
done at input time, so that when a user presses the � key on their
keyboard they get U+00D6.  Problem solved.  If this is U+004F U+0308
then someone has entered something weird to begin with.  "ls" may
chose to display this as an anomaly by outputting it as O\u0308 or
something like that, but that's again a presentation issue.

          -hpa
-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt    <[EMAIL PROTECTED]>
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to