> UTF-8 represents Unicode characters by a series of bytes, of > between 1 and 6 > bytes in length - true ASCII characters (of value less than > 128) are also > valid UTF-8 and represented by 1 byte, and all other characters are > represented by more than one byte. You can put any char > value you want > (including null characters and UTF-8 byte sequences) into a > std::string > object. UTF-8 is just another series of bytes as far as a > std::string object > is concerned, as is any other byte-based encoding such as ISO8859-1. > > A Glib::ustring object stores its UTF-8 contents as a series > of bytes in the > same way that a std::string object does (in fact, it contains > a std::string > object for that purpose). The main difference between a > std::string object > and a Glib::ustring object is that the Glib::ustring object > counts it size, > iterates and indexes itself with operator[]() by reference to > whole Unicode > characters rather than bytes - operator[]() will return an > entire Unicode > (gunichar) character for the index rather than a byte, as > will dereferencing > a Glib::ustring iterator. It can also search by reference a Unicode > (gunichar) character and a Unicode (gunichar) character can > be inserted into > it (for that purpose the character will be converted into the > equivalent > UTF-8 byte representation and then inserted in the underlying > std::string > object). > > In many applications this extra functionality is irrelevant > and using a > std::string object for storing and manipulating UTF-8 byte > sequences will be > fine and have less overhead. In addition, if you try to manipulate a > Glib::ustring object after putting an invalid UTF-8 byte > sequence into it the > Glib::ustring object will be in an undefined state, so you > need to know that > what you are putting into it is valid. (You can check this before > manipulating it with Glib::ustring::validate().) > > You can check whether a std::string object contains valid UTF-8 with > g_utf8_validate(), and extract a Unicode character from the > byte stream it > contains with Glib::get_unichar_from_std_iterator(), so you > can take your > choice between using std::string or Glib::ustring depending > on your needs. >
That was very informative Chris, thanks. In fact, it would make a nice introduction to glib:ustring in the gtkmm book me thinks (assuming there isn't a better one already). Gaz _______________________________________________ gtkmm-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/gtkmm-list
