Hi,

I noticed that in a full UTF-8 environment editing a binary file with
joe-3.1 sooner or later may cause some display corruptions, e.g. parts of
the file overwrite the topmost row (status bar of joe) after pressing a
PageDown.

Looking closer at the problem I noticed that different terminal emulators
behave differently, and the problem is caused by certain Unicode entities
having different width under various terminal emulators.

(Just a side note: the typescript produced by 'script' shows that the output
of joe is the same in all cases ($TERM and screen size are the same so it's
not surprising) and it is valid UTF-8).

Under xterm, everything goes perfectly. Under konsole, the display
corruption occurs rarely. Under gnome-terminal the display corruption
happens even more often. (Note that there's a serious UTF-8 handling bug in
gnome-terminal, gnome bugzilla #154896, which is unrelated to this one and
I've already fixed that one.)

U+0483, U+064D and probably many other characters are displayed by
gnome-terminal having its own column while konsole and xterm does not
display these (maybe these modify the preceding character). According to
wcwidth(), these are zero-width characters. (Test case: echo -e 'x\322\203y'
-- this is U+0483.)

U+0603 and probably other characters are displayed as stand-alone characters
both by konsole and gnome-terminal (actually I see a rectangle due to
missing font but that should be irrelevant), but still in xterm they modify
the preceding character. (Try: echo -e 'x\330\203y'.) For this character
wcwidth returns -1.

For all the characters mentioned above, joe assumes they are zero-width (I
don't know if it's manually implemented in joe or it uses ncurses functions)
but still it prints them to the terminal expecting the cursor will remain in
the same position, which causes line overflow and line wrap if the terminal
actually moves the cursor, hence the topmost line gets scrolled out.

Okay, so here come my questions:

- Where can I find specification about the terminal width of each and every
Unicode character?

- Is glibc's wcwidth() considered to be a good implementation? What about
the cases where it returns -1, including U+0603 mentioned above?

- Is it clearly a bug in the terminal emulator (gnome-terminal/vte) if it
moves the cursor for a character whose wcwidth is zero? (I guess it is, and
I found it in gnome's bugzilla as #162262.)

- Is it documented somewhere what a terminal emulator should do if it
receives a character whose wcwidth equals to -1? Or is it the application's
(joe or ncurses, which?) responsibility not to print these kind of
characters?

- What shall a terminal emulator do with the cursor position if it receives
a character that is not assigned and known that won'be assigned, or when it
receives a character that is not yet assigned?




Thanks,

Egmont

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to