RE: Lazy man's UTF8

Robert de Bath Wed, 18 Sep 2002 13:51:08 -0700

On Wed, 18 Sep 2002, Maiorana, Jason wrote:

>
> >I have this simple little program, it uses locales (a bit) and even
> >has simple gettext internationalisation, now I want to convert it so
> >that it'll work on a completely UTF-8 locale _or_ a ISO8859-* locale
> >(as it does now) or even an ISO8859-* interface on a UTF-8 system.
>
> If you dont want to worry about it too much, just use the mb functions
> and let the locale control what they do:
>
> mblen instead of strlen
> strcoll instead of strcmp
> etc


NB: mblen and mbrlen are not the same as strlen, they're kindof replacments
    for strptr++

But the mb* strings are all for wide characters, "Mr. Lazy" knows about
wide characters and thinks they're a pain, especially for already existing
code.

> If you want to use hardcoded internal utf-8, then only convert on
> output, then iconv is perfect, and shockingly easy to use.
> Im actually a fan of completely ignoring locale as far as codesets
> go: Ill use utf-8 internally, and always output utf-8. (Locales
> are fine for date formatting)
iconv() is _fairly_ easy to use, the problem isn't that's it's difficult
just that there's a lot you have to remember to do for a function that
appears (at first) to have a simple job.

> To go that route you do need a good utf-8 to wchar_t converter,
> and wchar_t to utf-8 layouter occaisionally. These things are
> ubiquitous, you can even write your own:

> //here is an example utf-8 formatter
BTDTGTTS.

But, you're converting utf-8 values that (strictly speaking) are out of
range _and_ assuming the wchar_t is a UCS character.


-- 
Rob.                          (Robert de Bath <robert$ @ debath.co.uk>)
                                       <http://www.cix.co.uk/~mayday>



--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

RE: Lazy man's UTF8

Reply via email to