> > > > > 2.) Get rid of the "wide character set" and use utf-8 for the > > > user I/O as well as the internal calculations. > > > > #2 is the correct option. We should just keep everything in UTF8. > > Agreed, 100%.
If my understanding is not completely wrong, you have to choose the first option and stick with wide characters. There is no contradiction between "wide characters" and UTF-8. In fact, you need to use "wide characters" to properly handle UTF-8 encoded strings. Therefore #2 is not an option. This is what my libc documentation says: [...] UTF-8 is an ASCII compatible encoding where ASCII characters are represented by ASCII bytes and non-ASCII characters by sequences of 2-6 non-ASCII bytes, and finally UTF-16 is an extension of UCS-2 in which pairs of certain UCS-2 words can be used to encode non-BMP characters up to 0x10ffff. To represent wide characters the char type is not suitable. For this reason the ISO C standard introduces a new type which is designed to keep one character of a wide character string. To maintain the similarity there is also a type corresponding to int for those functions which take a single wide character. Data type: wchar_t This data type is used as the base type for wide character strings. I.e., arrays of objects of this type are the equivalent of char[] for multibyte character strings. The type is defined in `stddef.h'. [...] > > > The hard part is going to be converting the existing XML and > > database data from whatever it's currently using to UTF8. > > We don't currently include an "encoding" in the XML data file. That > could be used as a trigger to ask the user for the old encoding and > then convert the data to UTF-8. A nice touch would be to scan the > file first looking for any characters with the high order bit set to > see if conversion is needed in the first place. I don't know about database data, but the XML file is a complete mess. You will not find any high order bit set in the XML file, because libxml has converted everything into HTML-entities. But unfortunately the wrong entities for every encoding != Latin1. Here a manual recoding of the XML-File is necessary, as I described twice here on this mailing list. Reinke _______________________________________________ gnucash-devel mailing list [EMAIL PROTECTED] http://www.gnucash.org/cgi-bin/mailman/listinfo/gnucash-devel