>  That's simple, but how would you deal with the fact that
> Unicode has multiple representations of what people would usually
> regard as equivalent?  To enable UTF-8 identifiers, that has
> to be taken care of by gcc and linker (if gcc doesn't do a
> compile-time normalization).

I'd say you wouldnt :)
Just accept a null-terminated string of non "/"s for filenames
and accept any ALPHANUMERIC "_" or HIGH_ASCII for identifiers.

No normalization, no processing, not even proper utf-8 validation.
The programmer of course may choose to use proper utf-8 and
some normalization form as a convention, but I see no need to enforce
it it the compiler.


>
> Is there anything mentioned about this in SUS?
Im sorry, what is SUS?


> > Text strings and comments already work fine with utf-8. Just
> > identifiers dont. I think even a "use at your own risk" command
> > line switch, such as "--allow-high-ascii" would be a huge step
> > forward.
> 
>   Why would you use such a 'legacy-sounding' option name? I'd use
> '--allow-utf8-names'. 

It is legacy sounding, because I would rather have it be the default.
Its more appropriate as well: The compiler would'nt have to know
anything about utf-8 in this case, it just knows that there are a set
of bytes which dont cause any problems. This is, I think, a large
part of what utf-8 was designed for, originally.

Normalization, imo, is more for UI/security issues, like DNS lookups,
etc. Besides, if you were to come ascross some source code with
tons of overcoded utf-8, or non-normalized glyphs, that would raise
some eyebrows at least. (no need to have gcc bend over backwards
to normalize the stuff)




--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to