Hello Lyu, 2007/10/22, Lyu Abe <[EMAIL PROTECTED]>: > There's one thing I do not understand in character coding of the > server's reply. When I display, for example, tag sets, I can read this: > > 'a_tag_label': u'citoyennet\xe9' > > in which " u'citoyennet\xe9' " corresponds to an unicode encoded text, > right?
Yes. > Then I do not understand why we get unicode encoded strings, > while DEMEXP is supposed to have UTF-8 encoding... "UTF-8 is the byte-oriented encoding form of Unicode." http://www.unicode.org/faq/utf_bom.html#2 In other words, all strings on the server are stored in the UTF-8 byte encoding of the Unicode encoding. All exchanges between the server and the clients are done in UTF-8, a byte convention to represent Unicode characters. After that, each platform is free to do any appropriate conversion, e.g. use 16 or 32 bits character encoding if they will. However, you should take care to set the default Python encoding to UTF-8 when you dialogue with the server. To be honest, right now, the server does not check much this encoding. It mainly came from the GTK2 interface that produces UTF-8 strings. :-) But that should be done at one point. Best wishes, d. _______________________________________________ Demexp-dev mailing list Demexp-dev@nongnu.org http://lists.nongnu.org/mailman/listinfo/demexp-dev