I want to provide an equivalent of wcwidth for Haskell. Unicode only. Sometimes I can use wcwidth from C. Sometimes I can use own implementation, like <http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c> (should be simple because I already have the character database). I don't know of other alternatives. I need to know which implementation to use. I am currently thinking about using wcwidth when it's available and __STDC_ISO_10646__ is defined, and using a private implementation otherwise. Is there a better strategy? This one implies that it will use the private implementation under glibc-2.1.3, even though wcwidth is present. Maybe instead - or in addition to - checking for __STDC_ISO_10646__, the configure script should test somehow if wcwidth behaves as it should? * * * Another question. As you suggested, I am using iconv for the conversion between the local byte encoding and Unicode (falling back to ISO-8859-1 if unavailable or unusable). To do this, the configure script needs to find what flavors of Unicode iconv provides. The current strategy is as follows. First, how to use iconv at all: try to run a test program #including <iconv.h> and converting between "ISO-8859-1" and "ISO-8859-1", first without linking any libraries, then with -liconv. Trying to actually test a conversion seems necessary because e.g. installing Konstantin Chuguev's iconv onto glibc-2.1.3 and using its <iconv.h> without -liconv produces programs that dump core because of using his macros together with glibc's functions. Then, how to talk with it. I run test programs trying to convert a string "\300" from "ISO-8859-1" to encodings called "wchar_t", "UCS-4-INTERNAL", "UCS-4", "UTF-8". For each of them I check whether the result looks like one of: UCS-4 in native endianness, UCS-4 in BE, UTF-8. Then I use the first name found for one of these encodings, in order of my preference. If none found, fall back to ISO-8859-1. The effect seems to work with iconv implementations from glibc-2.1.3, Bruno and Konstantin. (Even though "UCS-4" means native endianness on one of them and BE in others.) If <langinfo.h> is present, I am using nl_langinfo(CODESET). If not, fall back to ISO-8859-1. Perhaps it should be improved, but nl_langinfo(CODESET) replacement I've seen in GNU fileutils is messy. What systems do support iconv but don't support nl_langinfo? This was configure time. At runtime: if iconv refuses to convert between charsets determined in the above way, fall back to ISO-8859-1. Note that since this conversion will be used by default in all I/O, it absolutely must do something sensible at least for ASCII, and preferably just pass other characters unmodified when in trouble. Has anybody a better idea? What other encodings, or ways of linking, or nl_langinfo replacements, are worth trying? I assume that if iconv works at all, it will provide "ISO-8859-1" under this name. A general disadvantage is that all binaries compiled with a Haskell compiler that will eventually use this stuff will be dependent on iconv. Hopefully it will not bite people in practice. They already depend on libgmp. I guess that iconv is not used on Windows at all. It will have to be implemented by another person, somebody who knows how to do it and can test it. -- __("< Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTĘPCZA QRCZAK - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
