Hi all, I'm just caching old messages as it seems that this one didn't get any response. I'll try to type in something :)
El jue, 04-12-2003 a las 12:11, Reinke Bonte escribió: > > > > 2.) Get rid of the "wide character set" and use utf-8 for the > > > > user I/O as well as the internal calculations. > > > #2 is the correct option. We should just keep everything in UTF8. > > Agreed, 100%. > > If my understanding is not completely wrong, you have to choose the > first option and stick with wide characters. No, it is not a must. > There is no contradiction between "wide characters" and UTF-8. In fact, > you need to use "wide characters" to properly handle UTF-8 encoded > strings. Therefore #2 is not an option. It is perfectly possible to handle utf-8 encoded strings without using C wide characters. It's just that you can't use standard str* functions for some tasks, that's all. Glib/Gdk libraries have the replacement functions for utf-8 strings, if I don't get it wrong. > This is what my libc documentation says: > [...] > UTF-8 is an ASCII compatible encoding where ASCII characters are > represented by ASCII bytes and non-ASCII characters by sequences of 2-6 > non-ASCII bytes, and finally UTF-16 is an extension of UCS-2 in which > pairs of certain UCS-2 words can be used to encode non-BMP characters up > to 0x10ffff. > To represent wide characters the char type is not suitable. For this > reason the ISO C standard introduces a new type which is designed to > keep one character of a wide character string. To maintain the > similarity there is also a type corresponding to int for those functions > which take a single wide character. > Data type: wchar_t > This data type is used as the base type for wide character strings. > I.e., arrays of objects of this type are the equivalent of char[] for > multibyte character strings. The type is defined in `stddef.h'. > [...] I don't see why this means we need to use GdkWChar (note: it is not wchat_t). > > > The hard part is going to be converting the existing XML and > > > database data from whatever it's currently using to UTF8. > > We don't currently include an "encoding" in the XML data file. That > > could be used as a trigger to ask the user for the old encoding and > > then convert the data to UTF-8. A nice touch would be to scan the > > file first looking for any characters with the high order bit set to > > see if conversion is needed in the first place. > I don't know about database data, but the XML file is a complete mess. > You will not find any high order bit set in the XML file, because libxml > has converted everything into HTML-entities. But unfortunately the wrong > entities for every encoding != Latin1. Here a manual recoding of the > XML-File is necessary, as I described twice here on this mailing list. I think it it perfectly possible to parse the XML file with it's parser, then check all strings (unencoded from HTML-entities by the parser). Regards _______________________________________________ gnucash-devel mailing list [EMAIL PROTECTED] http://www.gnucash.org/cgi-bin/mailman/listinfo/gnucash-devel