On Sat, Feb 24, 2007 at 06:13:37PM +0100, Julien Claassen wrote: > Hi! > What I meant about UTF-8-strings in c++: I mean in c and c++ they're not > standard like in Java.
UTF-16, used by Java, is also variable-width. It can be either 2 bytes or 4 bytes per character. Support for the characters that use 4 bytes is generally very poor due to the misconception that it's fixed-width.. :( > I think UTF-8 is a variable width multibyte charset, so > there are specific problems in handling them allocating the right space. I > mean the Glib contains something like UString and QT has its QStrings, which > I think are also UTF-8 capable. All strings are UTF-8 capable; the unit of data is simply bytes instead of characters. If you're looking for a class that treats strings as a sequence of abstract characters rather than a sequence of bytes, you could look for a library to do this or write your own. However I suspect the most useful way to do this on C++ would be to extend whatever standard byte-based string class you're using with a derived class. Maybe there's something like this built in to the C++ STL classes already that I'm not aware of. As I said I don't know much of (modern) C++. Can someone who knows the language better provide an answer? It would also be easier to provide you answers if we knew better what you're trying to do with the strings, i.e. whether you just need to store them and spit them back in output, or whether you need to do higher-level unicode processing like line breaks, collation, rendering, etc. Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
