Egmont Koblinger <[EMAIL PROTECTED]> wrote:

Hi,

> I noticed that in a full UTF-8 environment editing a binary file with
> joe-3.1 sooner or later may cause some display corruptions, e.g. parts of
> the file overwrite the topmost row (status bar of joe) after pressing a
> PageDown.
> 
> Looking closer at the problem I noticed that different terminal emulators
> behave differently, and the problem is caused by certain Unicode entities
> having different width under various terminal emulators.
> 
> ...
Yes, that's a problem and it can hardly be solved because it's both an 
issue of evolving versions and different features implemented by 
various terminals.


> Okay, so here come my questions:
> 
> - Where can I find specification about the terminal width of each and every
> Unicode character?
Nowhere. The reference mentioned by Bruno defines what Unicode thinks 
should be done and even so far it leaves a number of ambiguous cases open.
This doesn't give you reliable information on what your terminal does.
For example xterm implements double-width and combining characters, 
Linux console does not. There is no standard and no system information 
that would tell you how the currently used terminal does it.

> - Is glibc's wcwidth() considered to be a good implementation? What about
> the cases where it returns -1, including U+0603 mentioned above?
That depends on what you need it for. My answer would be no because 
it doesn't help you with cell-spaced terminals and terminal emulators.

> ...
> 
> - What shall a terminal emulator do with the cursor position if it receives
> a character that is not assigned and known that won'be assigned, or when it
> receives a character that is not yet assigned?
It should display an appropriate replacement indication (in a width 
as explained by Bruno) such as U+FFFD REPLACEMENT CHARACTER, or a 
plain box.

To cope with the missing feasibility to provide reliable configured 
information for terminal behaviour regarding character widths etc., 
my editor mined (http://towo.net/mined) performs auto-detection of 
terminal properties, so it will know what features the terminal 
really has instead of having to guess.

Kind regards,
Thomas Wolff


Bruno Haible <[EMAIL PROTECTED]> wrote:
> Egmont Koblinger asked:
> > - Where can I find specification about the terminal width of each and every
> > Unicode character?
> 
> http://www.unicode.org/reports/tr11/ and the Unicode character database 4.1.
> 
> > - Is glibc's wcwidth() considered to be a good implementation?
> 
> Yes. Note that for characters with ambiguous width (where the width is 1
> in European contexts and 2 in Japanese contexts) it returns 1.
> 
> > What about
> > the cases where it returns -1, including U+0603 mentioned above?
> 
> -1 is returned for control characters and similar, where the cursor
> movement is not predictable.
> 
> > - Is it clearly a bug in the terminal emulator (gnome-terminal/vte) if it
> > moves the cursor for a character whose wcwidth is zero? (I guess it is, and
> > I found it in gnome's bugzilla as #162262.)
> 
> Yes. A terminal emulator is supposed to display these zero-width and
> combining characters in a way that doesn't move the cursor.
> 
> > - Is it documented somewhere what a terminal emulator should do if it
> > receives a character whose wcwidth equals to -1?
> 
> These are control characters. For some, like U+000A, the semantics is
> clear; for others, it is unknown.
> 
> > - What shall a terminal emulator do with the cursor position if it receives
> > a character that is not assigned and known that won'be assigned
> 
> Undefined behaviour.
> 
> > or when it receives a character that is not yet assigned?
> 
> It should assume that it is a normal graphic characters whose width is
> 1, 2, or 0, depending on the numeric code of the character. For example,
> the characters U+20000..U+2FFFD and U+30000..U+3FFFD all have width 2,
> although many of them are not yet assigned.


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to