On Tue, Apr 17, 2007 at 02:04:32AM +0800, Abel Cheung wrote: > >This is only an issue on character-cell devices which use wcwidth. > > I'm exactly talking about those apps, like terminals.
Given how utterly abysmal current terminals' Unicode support is, this seems like a relatively minor issue. I don't want to disparage concern about getting it right, but rather investigate where we're at now and what needs to be done. Along those lines, I recently evaluated some terminals with the following results: Konsole and Xfce terminal: no support for nonspacing characters; unsure about whether cjk wide characters are right. Gnome Terminal: I assume it's the same since Xfce uses the same widget. Please correct me if I'm mistaken since I didn't try it. urxvt and xterm: CJK and nonspacing character widths are correct, but rendering is minimal overstrike for nonspacing characters. No bidi or complex script support. xterm default of only 1 combining character per cell is horribly deficient for any language that doesn't just use precomposed characters anyway. aterm/rxvt/Eterm/etc.: unmaintained; no UTF-8 support at all. mlterm: CJK and nonspacing character widths are correct, bidi is available (not sure how well it works) with correct Arabic shaping, and Indic reordering/shaping is available but as a special case (not sure how well it works either). Also, cursor position becomes nonsensical (font-dependent too) with Indic shaping, making screen-mode (my terminology, as opposed to line-mode) apps difficult to use. uuterm (experimental; by me): CJK and nonspacing character widths are correct. Shaping/ligatures are supported and sufficient for all scripts afaik, but using a nonstandard font system (ucf). Bidi and reordering (for Indic vowel marks on left) are not available. So as of now, here is the status of support for particular languages I'm aware of: European-script langs using precomposed forms only: any terminal except legacy stuff lacking UTF-8 support should be fine. European-script languages with multiple decomposed accents: uuterm is probably the only one that works. Languages of India: mlterm and some old, unmaintained Indic-specific terminals (pre-Unicode I think) are the only ones that work. CJK, Thai, Lao: urxvt, xterm, mlterm, and uuterm all work. uuterm is the only one that supports decomposed Korean (Hangul Jamo) though. Tibetan: uuterm is the only terminal that works correctly, but a minimal degree of legibility can be obtained with an ugly tailored font that does not require shaping, so that urxvt, xterm, and mlterm are usable. Burmese: not supported by anything. Arabic and Hebrew: mlterm and perhaps some rtl-specific terminal emulators I'm not aware of..? Mongolian: unknown; probably only mlterm and I'm unsure whether it even works acceptably well. One additional issue I have not tested is support for characters outside the BMP. I know GNU screen totally lacks support for these, and I suspect many terminal emulators have the same problem. ~Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
