On Friday 28 October 2005 13:00, Matthias Kaeppler wrote: > Let's say I have a filename named "übung1.txt" (Note the umlaut--if your > newsreader can display it hehe). > Will this filename make trouble with std::string, or be lost/replaced > when converting to Unicode?
UTF-8 represents Unicode characters by a series of bytes, of between 1 and 6 bytes in length - true ASCII characters (of value less than 128) are also valid UTF-8 and represented by 1 byte, and all other characters are represented by more than one byte. You can put any char value you want (including null characters and UTF-8 byte sequences) into a std::string object. UTF-8 is just another series of bytes as far as a std::string object is concerned, as is any other byte-based encoding such as ISO8859-1. A Glib::ustring object stores its UTF-8 contents as a series of bytes in the same way that a std::string object does (in fact, it contains a std::string object for that purpose). The main difference between a std::string object and a Glib::ustring object is that the Glib::ustring object counts it size, iterates and indexes itself with operator[]() by reference to whole Unicode characters rather than bytes - operator[]() will return an entire Unicode (gunichar) character for the index rather than a byte, as will dereferencing a Glib::ustring iterator. It can also search by reference a Unicode (gunichar) character and a Unicode (gunichar) character can be inserted into it (for that purpose the character will be converted into the equivalent UTF-8 byte representation and then inserted in the underlying std::string object). In many applications this extra functionality is irrelevant and using a std::string object for storing and manipulating UTF-8 byte sequences will be fine and have less overhead. In addition, if you try to manipulate a Glib::ustring object after putting an invalid UTF-8 byte sequence into it the Glib::ustring object will be in an undefined state, so you need to know that what you are putting into it is valid. (You can check this before manipulating it with Glib::ustring::validate().) You can check whether a std::string object contains valid UTF-8 with g_utf8_validate(), and extract a Unicode character from the byte stream it contains with Glib::get_unichar_from_std_iterator(), so you can take your choice between using std::string or Glib::ustring depending on your needs. Chris _______________________________________________ gtkmm-list mailing list gtkmm-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtkmm-list