Rich Felker wrote:
On Sat, Feb 24, 2007 at 06:13:37PM +0100, Julien Claassen wrote:
Hi!
What I meant about UTF-8-strings in c++: I mean in c and c++ they're not standard like in Java.

UTF-16, used by Java, is also variable-width. It can be either 2 bytes
or 4 bytes per character. Support for the characters that use 4 bytes
is generally very poor due to the misconception that it's
fixed-width.. :(

I think UTF-8 is a variable width multibyte charset, so there are specific problems in handling them allocating the right space. I mean the Glib contains something like UString and QT has its QStrings, which I think are also UTF-8 capable.
As far as i know:

Using UTF-8 in C or C++ is very simple:
As UTF-8 may not contain '\0' you can simply use all
functions as before (strcmp(), std::string etc.).
Old code doesn't need to be ported.

The only place to take care is when interfacing other libraries
using wchar_t and such (UTF-16, UTF-32), here
you need to convert using functions like wcstrtombs(), mbstrtowcs(), mbrtowc() and such.
This works well on Linux, Windows or other OS,

Marcel

All strings are UTF-8 capable; the unit of data is simply bytes
instead of characters. If you're looking for a class that treats
strings as a sequence of abstract characters rather than a sequence of
bytes, you could look for a library to do this or write your own.
However I suspect the most useful way to do this on C++ would be to
extend whatever standard byte-based string class you're using with a
derived class.

Maybe there's something like this built in to the C++ STL classes
already that I'm not aware of. As I said I don't know much of (modern)
C++. Can someone who knows the language better provide an answer?

It would also be easier to provide you answers if we knew better what
you're trying to do with the strings, i.e. whether you just need to
store them and spit them back in output, or whether you need to do
higher-level unicode processing like line breaks, collation,
rendering, etc.

Rich

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/




--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to