RE: filename and normalization (was gcc identifiers)

Jungshik Shin Wed, 04 Dec 2002 14:24:20 -0800

On Wed, 4 Dec 2002, Maiorana, Jason wrote:

> Normalization for D has some serious drawbacks: if you were to try
> to implement, say vietnamese using only composing characters,
> it would look horrible. The appearance, position, shape, and size
> of the combining accents depends on which letter they are being
> combined with, as well as which other diacritics are being combined
> with that same letter.

  What's your point here? NFD or NFC, they should be rendered
identically by 'modern' rendering engines.  You're making an assumption
that the way characters are rendered depend on in which NF they're
stored/represented. At least in principle, that should not be the case.
Even a not-so-capable renderer(e.g. xterm with bitmap font or
Linux console) can do a internal normalization to fit their need
and capability.

> NF-C is most appropriate for some scripts, and NF-D may be desirable
> for others. It would be better,

  What are your criteria? Again, rendering? As I wrote above,
that has nothing to do with NFs used.

> IMO, if unicode would get rid
> of both forms, and simply support one representation of each
> possible glyph. (No combining characters unless they are the ONLY

  'glyphs'? Coded character set is not about glyphs but about
characters.

> way to represent a particular glyph) (Actually, no combining chars
> at all would be best, because its simplest. Why not just assign
> more code space to the langs that need it?)

 Do you want to give 1.5 million (and more) code points to Korean script?
Why don't you propose your idea to UTC and ISO/IEC JTC1/SC2/WG2?
Either your mailbox will be bombarded with a lot of emails
or you will be greeted with 'dead slience'.

> If you have a filesystem that forces NF-D, then I would say its a
> poorly designed filesystem that makes such choices, because its
> way to low level to care about things like that. Filenames should
> be "string of bytes", and the UI-conventions should allow one
> to distunguish. If you are on a NF-C==canonical system, and you
> mount such a filesystem, you should see bakemoji, and not
> any translated normalization form.

  Why bakemoji? No matter what NF are used in filenames, they should
be just rendered as they should be rendered by any Unicode-compliant
rendering engines.  This behavior is more  consistent with your view
that filenames are strings of bytes than showing 'bakemjoi'.

  Jungshik Shin

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/
RE: filename and normalization (was gcc identifiers)

Reply via email to