On Wed, 18 Sep 2002 11:07:27 -0400
"Maiorana, Jason" <[EMAIL PROTECTED]> wrote:
>
> >I have this simple little program, it uses locales (a bit) and even
> >has simple gettext internationalisation, now I want to convert it so
> >that it'll work on a completely UTF-8 locale _or_ a ISO8859-* locale
> >(as it does now) or even an ISO8859-* interface on a UTF-8 system.
>
> If you dont want to worry about it too much, just use the mb functions
> and let the locale control what they do:
>
> mblen instead of strlen
> strcoll instead of strcmp
> etc
>
>
>
>
> If you want to use hardcoded internal utf-8, then only convert on
> output, then iconv is perfect, and shockingly easy to use.
> Im actually a fan of completely ignoring locale as far as codesets
> go: Ill use utf-8 internally, and always output utf-8. (Locales
> are fine for date formatting)
>
> To go that route you do need a good utf-8 to wchar_t converter,
> and wchar_t to utf-8 layouter occaisionally. These things are
> ubiquitous, you can even write your own:
>
>
> //here is an example utf-8 formatter
> //it turns ucs-4 character "value" into a utf-8 string held in "buf"
> //which must have room for at least 6 bytes
> //the return value is the length of the utf-8 string
>
> int ucs4toutf8( wchar_t value, unsigned char *buf )
This assumes the user is in an __STDC_ISO_10646__ environment. You have
to use wctomb and mbtowc.
> {
> if( value <= 0x0000007F )
> {
> buf[0] = (unsigned char)value;
> return 1;
> }
> else if( value <= 0x000007FF )
> {
> buf[1] = (unsigned char)(value & 0x3F | 0x80);
> value>>=6;
> buf[0] = (unsigned char)(value & 0x1F | 0xC0);
> return 2;
> }
> else if( value <= 0x0000FFFF )
> {
> buf[2] = (unsigned char)(value & 0x3F | 0x80);
> value>>=6;
> buf[1] = (unsigned char)(value & 0x3F | 0x80);
> value>>=6;
> buf[0] = (unsigned char)(value & 0x0F | 0xE0);
> return 3;
> }
> else if( value <= 0x001FFFFF )
> {
> buf[3] = (unsigned char)(value & 0x3F | 0x80);
> value>>=6;
> buf[2] = (unsigned char)(value & 0x3F | 0x80);
> value>>=6;
> buf[1] = (unsigned char)(value & 0x3F | 0x80);
> value>>=6;
> buf[0] = (unsigned char)(value & 0x07 | 0xF0);
> return 4;
> }
> else if( value <= 0x03FFFFFF )
> {
> buf[4] = (unsigned char)(value & 0x3F | 0x80);
> value>>=6;
> buf[3] = (unsigned char)(value & 0x3F | 0x80);
> value>>=6;
> buf[2] = (unsigned char)(value & 0x3F | 0x80);
> value>>=6;
> buf[1] = (unsigned char)(value & 0x3F | 0x80);
> value>>=6;
> buf[0] = (unsigned char)(value & 0x03 | 0xF8);
> return 5;
> }
> else if( value <= 0x7FFFFFFF )
> {
> buf[5] = (unsigned char)(value & 0x3F | 0x80);
> value>>=6;
> buf[4] = (unsigned char)(value & 0x3F | 0x80);
> value>>=6;
> buf[3] = (unsigned char)(value & 0x3F | 0x80);
> value>>=6;
> buf[2] = (unsigned char)(value & 0x3F | 0x80);
> value>>=6;
> buf[1] = (unsigned char)(value & 0x3F | 0x80);
> value>>=6;
> buf[0] = (unsigned char)(value & 0x01 | 0xFC);
> return 6;
> }
> return 0;
> }
> --
> Linux-UTF8: i18n of Linux on all levels
> Archive: http://mail.nl.linux.org/linux-utf8/
>
>
--
A program should be written to model the concepts of the task it
performs rather than the physical world or a process because this
maximizes the potential for it to be applied to tasks that are
conceptually similar and more importantly to tasks that have not
yet been conceived.
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/