On Thu, Dec 13, 2001 at 05:55:48PM -0100, Bengt Johansson wrote: > I have an application that internally uses Unicode. The application
Do you mean UTF-8? (And does it really use Unicode, or does it use the locale encoding? If the former, you need to be careful whenever doing any sort of I/O to convert as necessary, so the latter is usually better.) > I don't know much about this, so I was hoping that the wide character > stuff would work with my Unicode strings - and infact it did, with a > German installation of Linux, but on an English installation it stopped > working. Setting the locale correctly? > After reading about the wide character functions I realized that they > are locale dependent. But my program is not. The Unicode strings in my > program, may contain any Unicode characters, no matter what the locale > is. >From http://www.cl.cam.ac.uk/~mgk25/unicode.html: "C support for Unicode and UTF-8 Starting with GNU glibc 2.2, the type wchar_t is officially intended to be used only for 32-bit ISO 10646 values, independent of the currently used locale. This is signalled to applications by the definition of the __STDC_ISO_10646__ macro as required by ISO C99. The ISO C multi-byte conversion functions (mbsrtowcs(), wcsrtombs(), etc.) are fully implemented in glibc 2.2 or higher and can be used to convert between wchar_t and any locale-dependent multibyte encoding, including UTF-8, ISO 8859-1, etc." > Does anybode have any suggestions as to what format I should use when > communicating with extern libraries like the Gdk libraries, or even the > stdlib and its string functions? It seems to me that wide character > would be the right solution, but on the English installation these calls > crashes as soon as the (32-bit) character code is larger than 255. > > Is there a standard way to map arbitary Unicode characters to wide > character without taking the locale into account? First, though, I'd find out whether the char * versions of the GTK calls honor the locale. If they don't, complain to them loudly; they should. If you really, really need to use wchar versions, you're probably better off requiring __STDC_ISO_10646__. (That's probably reasonable for GTK apps, but I'm no advocate of supporting obsolete compilers, so you'd be well off to get other opinions.) If the GTK functions do honor the locale (and, since they probably use C functions, they probably do to some degree), you're much better off using them. Debugging wchar-based programs is a real pain. As to GTK crashing, that'd be a bug (whether it honors the locale explicitely or not), so I'd report it. (If you're setting the locale and it's not expecting you to, that could cause this.) -- Glenn Maynard -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
