On Tue, Apr 17, 2007 at 12:11:12AM +0800, Abel Cheung wrote: > On 4/11/07, Rich Felker <[EMAIL PROTECTED]> wrote: > >Indeed, glibc's character data is horribly outdated and incorrect. > >There are plenty of unsupported nonspacing characters, even characters > >that were present in Unicode 4.0. It also considers nonspacing letters > >to be non-alphabetic, which is a real problem for users of languages > >which utilize nonspacing letters. > > AFAIK Pablo Saraxtaga has done something about it [1], though I > didn't intend to dig deeper and check what has been done. > > [1] http://sourceware.org/bugzilla/show_bug.cgi?id=3885
This works, bug UHG it's so disgusting. Someday people need to realize that POSIX charmap/localedef format is utterly broken for use with Unicode and replace it with something reasonable that doesn't take 200 megs of core.. > It really depends on the intended audience of the fonts. The original > intention for those double width Greek and Cyrillic characters is to > make them align nicely with all other CJK characters. Then there are > no such thing as wide Greek/Cyrillic characters and wide version of > some other symbols in Unicode, so font designers in Asia are forced > to make them wide and map them to narrow ones, since they must > support legacy encoding for commercial or whatever reason. > They are doing this out of no choice (except discarding those > glyphs, which would offend other users). This is only an issue on character-cell devices which use wcwidth. For GUI applications, the metrics of the font will govern layout and alignment, so either can be used. I don't think it's such a big deal to say these fonts with wide Greek, Cyrillic, etc. aren't suitable for terminals. In fact they could be automatically used just by squeezing the glyph horizontally and cropping off the excess spacing. > I'm also bitten by this issue -- PUA codepoints always have wcwidth=1, > and it would make CJK fonts suck again because characters keep > overlapping against each other. Yes, PUA usage should be avoided > whenever possible, but we would still see legacy systems in the > short future. Yes, PUA is very bad. I wouldn't be opposed to designating a certain portion of the PUA as "wide", but I question whether using the PUA on charcell devices is even needed. > Not to mention some characters would never have the > chance to enter Unicode. We can debate whether things like the Apple™® symbol are characters or not all we like, but can you come up with things that should legitimately be wide (i.e. ideographs) which have no chance to enter Unicode? Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
