Egmont Koblinger <[EMAIL PROTECTED]> wrote: Hi,
> I noticed that in a full UTF-8 environment editing a binary file with > joe-3.1 sooner or later may cause some display corruptions, e.g. parts of > the file overwrite the topmost row (status bar of joe) after pressing a > PageDown. > > Looking closer at the problem I noticed that different terminal emulators > behave differently, and the problem is caused by certain Unicode entities > having different width under various terminal emulators. > > ... Yes, that's a problem and it can hardly be solved because it's both an issue of evolving versions and different features implemented by various terminals. > Okay, so here come my questions: > > - Where can I find specification about the terminal width of each and every > Unicode character? Nowhere. The reference mentioned by Bruno defines what Unicode thinks should be done and even so far it leaves a number of ambiguous cases open. This doesn't give you reliable information on what your terminal does. For example xterm implements double-width and combining characters, Linux console does not. There is no standard and no system information that would tell you how the currently used terminal does it. > - Is glibc's wcwidth() considered to be a good implementation? What about > the cases where it returns -1, including U+0603 mentioned above? That depends on what you need it for. My answer would be no because it doesn't help you with cell-spaced terminals and terminal emulators. > ... > > - What shall a terminal emulator do with the cursor position if it receives > a character that is not assigned and known that won'be assigned, or when it > receives a character that is not yet assigned? It should display an appropriate replacement indication (in a width as explained by Bruno) such as U+FFFD REPLACEMENT CHARACTER, or a plain box. To cope with the missing feasibility to provide reliable configured information for terminal behaviour regarding character widths etc., my editor mined (http://towo.net/mined) performs auto-detection of terminal properties, so it will know what features the terminal really has instead of having to guess. Kind regards, Thomas Wolff Bruno Haible <[EMAIL PROTECTED]> wrote: > Egmont Koblinger asked: > > - Where can I find specification about the terminal width of each and every > > Unicode character? > > http://www.unicode.org/reports/tr11/ and the Unicode character database 4.1. > > > - Is glibc's wcwidth() considered to be a good implementation? > > Yes. Note that for characters with ambiguous width (where the width is 1 > in European contexts and 2 in Japanese contexts) it returns 1. > > > What about > > the cases where it returns -1, including U+0603 mentioned above? > > -1 is returned for control characters and similar, where the cursor > movement is not predictable. > > > - Is it clearly a bug in the terminal emulator (gnome-terminal/vte) if it > > moves the cursor for a character whose wcwidth is zero? (I guess it is, and > > I found it in gnome's bugzilla as #162262.) > > Yes. A terminal emulator is supposed to display these zero-width and > combining characters in a way that doesn't move the cursor. > > > - Is it documented somewhere what a terminal emulator should do if it > > receives a character whose wcwidth equals to -1? > > These are control characters. For some, like U+000A, the semantics is > clear; for others, it is unknown. > > > - What shall a terminal emulator do with the cursor position if it receives > > a character that is not assigned and known that won'be assigned > > Undefined behaviour. > > > or when it receives a character that is not yet assigned? > > It should assume that it is a normal graphic characters whose width is > 1, 2, or 0, depending on the numeric code of the character. For example, > the characters U+20000..U+2FFFD and U+30000..U+3FFFD all have width 2, > although many of them are not yet assigned. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
