Re: wcwidth and locale

Rich Felker Mon, 16 Apr 2007 10:19:36 -0700

On Tue, Apr 17, 2007 at 12:11:12AM +0800, Abel Cheung wrote:
> On 4/11/07, Rich Felker <[EMAIL PROTECTED]> wrote:
> >Indeed, glibc's character data is horribly outdated and incorrect.
> >There are plenty of unsupported nonspacing characters, even characters
> >that were present in Unicode 4.0. It also considers nonspacing letters
> >to be non-alphabetic, which is a real problem for users of languages
> >which utilize nonspacing letters.
> 
> AFAIK Pablo Saraxtaga has done something about it [1], though I
> didn't intend to dig deeper and check what has been done.
> 
> [1] http://sourceware.org/bugzilla/show_bug.cgi?id=3885


This works, bug UHG it's so disgusting. Someday people need to realize
that POSIX charmap/localedef format is utterly broken for use with
Unicode and replace it with something reasonable that doesn't take 200
megs of core..

> It really depends on the intended audience of the fonts. The original
> intention for those double width Greek and Cyrillic characters is to
> make them align nicely with all other CJK characters. Then there are
> no such thing as wide Greek/Cyrillic characters and wide version of
> some other symbols in Unicode, so font designers in Asia are forced
> to make them wide and map them to narrow ones, since they must
> support legacy encoding for commercial or whatever reason.
> They are doing this out of no choice (except discarding those
> glyphs, which would offend other users).

This is only an issue on character-cell devices which use wcwidth. For
GUI applications, the metrics of the font will govern layout and
alignment, so either can be used. I don't think it's such a big deal
to say these fonts with wide Greek, Cyrillic, etc. aren't suitable for
terminals. In fact they could be automatically used just by squeezing
the glyph horizontally and cropping off the excess spacing.

> I'm also bitten by this issue -- PUA codepoints always have wcwidth=1,
> and it would make CJK fonts suck again because characters keep
> overlapping against each other. Yes, PUA usage should be avoided
> whenever possible, but we would still see legacy systems in the
> short future.

Yes, PUA is very bad. I wouldn't be opposed to designating a certain
portion of the PUA as "wide", but I question whether using the PUA on
charcell devices is even needed.

> Not to mention some characters would never have the
> chance to enter Unicode.

We can debate whether things like the Apple™® symbol are characters or
not all we like, but can you come up with things that should
legitimately be wide (i.e. ideographs) which have no chance to enter
Unicode?

Rich

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: wcwidth and locale

Reply via email to