Bug#477366: linking ncurses-ruby against libncursesw5

Tobias Fri, 01 May 2009 17:39:15 -0700

Adeodato Simó suggests to follow through with the suggestion of this bug
report and link ncurses-ruby against ncursesw instead of ncurses because
he has observed that the sup email program will display non-ascii
characters better on a utf-8 terminal when linked like that.


I am the upstream author of ncurses-ruby. I admit that until today I had
no clear idea what the difference was between ncurses and ncursesw,
apart from ncursesw "somehow" enabling "wide characters". I have
investigated the matter today and I recommend not to link ncurses-ruby
against ncursesw.

Reasoning:

I agree that it would be a good thing to have a ruby ncurses binding
that links against ncursesw. Conventional ncurses (without the trailing
w) only works well with 8 bit charsets. A few years ago, this has not
been a problem for most users, as it was common then for linux
distributions to configure local 8 bit charsets like ISO-8859-1. Now
however, virtually every linux installation defaults to UTF-8 character
encoding. With the consequence that non-ascii characters require more
than one byte for encoding them. NCurses programs that worked fine in
the old environment will no longer display non-ascii characters reliably.

Is ncursesw the rescue? Yes, but its not that simple. You cannot simply
link an ncurses program against ncursesw and expect it to magically work
with UTF-8 Strings. In the email program mentioned above, you will still
notice display errors when you use the cursor keys to highlight a line
in the message body that contains non-ascii characters: Not the whole
line is highlighted, a few character cells will remain black. If an
email runs over several pages, then flipping the pages may cause some
garbage from the previous page remain on the screen in lines containing
non-ascii characters.

What is happening?  The email program still calls mvaddstr with an utf-8
encoded string. As far as ncurses(w) is concerned, the multiple bytes
that make up a single non-ascii character are distributed to different
character cells on the screen. The only reason why the user can
recognise the original non-ascii character on the screen is that ncurses
probably also happens to "print" the sub-character bytes in the correct
sequence to the terminal, which then interprets the resulting UTF-8
encoding. However, after the printing, there is a disagreement on the
horizontal position of the cursor between the terminal and the
ncurses(w) library.

The correct way to use ncursesw to print non-ascii, utf-8 encoded
characters on a utf-8 terminal is for the application to split the
string to print into (possibly multibyte) characters, compute the
unicode codepoint for each character, and call the wide character
functions of ncursesw (e.g. mvadd_wch, mvaddwstr). This requires a
ncursesw-ruby wrapper as well as changes to the application. Looking at
the source code of the mailer I'd say that it is not really suited for
UTF-8 encoded strings yet, as it still assumes that the length of a
string in bytes is equal to the number of characters in the string.

Conclusions:
- The switch from 8 bit character sets to UTF-8 requires serious
modifications to applications using ncurses. Ncursesw cannot be used as
a drop-in replacement.
- A separate ncursesw-ruby wrapper is desirable. It has to export the
additional wide character functions.

Tobias



-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Bug#477366: linking ncurses-ruby against libncursesw5

Reply via email to