Hi, I noticed that in a full UTF-8 environment editing a binary file with joe-3.1 sooner or later may cause some display corruptions, e.g. parts of the file overwrite the topmost row (status bar of joe) after pressing a PageDown.
Looking closer at the problem I noticed that different terminal emulators behave differently, and the problem is caused by certain Unicode entities having different width under various terminal emulators. (Just a side note: the typescript produced by 'script' shows that the output of joe is the same in all cases ($TERM and screen size are the same so it's not surprising) and it is valid UTF-8). Under xterm, everything goes perfectly. Under konsole, the display corruption occurs rarely. Under gnome-terminal the display corruption happens even more often. (Note that there's a serious UTF-8 handling bug in gnome-terminal, gnome bugzilla #154896, which is unrelated to this one and I've already fixed that one.) U+0483, U+064D and probably many other characters are displayed by gnome-terminal having its own column while konsole and xterm does not display these (maybe these modify the preceding character). According to wcwidth(), these are zero-width characters. (Test case: echo -e 'x\322\203y' -- this is U+0483.) U+0603 and probably other characters are displayed as stand-alone characters both by konsole and gnome-terminal (actually I see a rectangle due to missing font but that should be irrelevant), but still in xterm they modify the preceding character. (Try: echo -e 'x\330\203y'.) For this character wcwidth returns -1. For all the characters mentioned above, joe assumes they are zero-width (I don't know if it's manually implemented in joe or it uses ncurses functions) but still it prints them to the terminal expecting the cursor will remain in the same position, which causes line overflow and line wrap if the terminal actually moves the cursor, hence the topmost line gets scrolled out. Okay, so here come my questions: - Where can I find specification about the terminal width of each and every Unicode character? - Is glibc's wcwidth() considered to be a good implementation? What about the cases where it returns -1, including U+0603 mentioned above? - Is it clearly a bug in the terminal emulator (gnome-terminal/vte) if it moves the cursor for a character whose wcwidth is zero? (I guess it is, and I found it in gnome's bugzilla as #162262.) - Is it documented somewhere what a terminal emulator should do if it receives a character whose wcwidth equals to -1? Or is it the application's (joe or ncurses, which?) responsibility not to print these kind of characters? - What shall a terminal emulator do with the cursor position if it receives a character that is not assigned and known that won'be assigned, or when it receives a character that is not yet assigned? Thanks, Egmont -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
