Re: ncurses plus utf-8

towo Wed, 13 Oct 2004 16:36:39 -0700

[EMAIL PROTECTED] wrote:
> Following is a post made by Elias Martenson some time ago.
> This little tidbit of information does not seem to be in the ncurses
> manpages, and isnt easy to find on google.
> ...
> You have to remember to link with -lncursesw instead of -lncurses.
> 
> Also don't forget to issue the call to setlocale(LC_ALL,"") in the
> beginning of the program. Although I suppose you already do. :-)
> ...
> If you want to use UTF-8, you're pretty much done! Just work with the
> UTF-8 strings just like any other string. Just remember to use wcslen()
> instead of strlen() if you want the number of characters. This is
> particularily important when doing formatting for a curses app.

Ah, this finally works indeed with ncursesw 5.4.
I found one bug with double-width characters:
* After positioning the cursor inmidst a double-width character, 
  the following output often appears 1 cell left or right of the correct 
  position, resulting in screen garbage (if, e.g., a menu is opened 
  over double-width text).

Markus Kuhn wrote:
> For screen layout purposes, don't you really want to use wcswidth()
> instead of wcslen()? The former gives you the number of character cells
> that a string will move the cursor forward on a typical UTF-8 terminal,
> the latter gives you merely the number of characters, each of which may
> consume 0, 1, or 2 character cells.

A problem here is "what is a typical UTF-8 terminal?" or rather 
"does my terminal behave in a typical way or what does it do?" and 
how can wcslen know about it?
There are so many variations, some caused by historic development 
(different versions of Unicode data and adjusted data tables of 
terminals), others caused by different capabilites (Linux console 
has no double width, some terminals have combining characters, others 
don't, mlterm automatically joins Arabic ligatures, xterm has a weird 
legacy option -cjk_width, non-BMP implementation may vary, etc).

Depending on locale information only, wcslen can hardly really know 
how wide a string will be on the terminal being used unless you 
would require every existing combination of variations to be 
compiled into separate locales which no one will ever do.
And could ligature joining of mlterm be expressed in a locale at all?

My editor mined (http://towo.net/mined/) handles all these variations 
by auto-detection of the terminal's properties. Thus a user with a 
UTF-8 terminal will always have text width and cursor positioning 
handled properly by mined regardless of locale configuration (i.e., 
I'm not using wcslen). I think this is a major advantage and I would 
recommend to consider this approach.
Would it be a valuable option for ncurses? If you consider it useful 
I would be willing to extract the code as a contribution.

Kind regards,
Thomas Wolff

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: ncurses plus utf-8

Reply via email to