Hi, On Sun, Apr 18, 2010 at 10:53:22AM -0700, Judah Jacobson wrote: > > Anyway, the short story is that I have to either hard-code the > > character set to something like utf-8, or ghc will start to behave > > really strange (for example, ghci would terminate immediately if > > you just *type* a non-ASCII character). > > That sounds like it might be something to do with the haskeline > package, which ghci uses for user interaction. Haskeline makes its > own FFI calls to translate raw input bytes into Unicode Chars.
Oh, this may indeed be a second problem. However, the encoding problem itself also manifests in the `openTempFile001' test of the testsuite. For example, with an unpatched ghc-6.12, the test fails with the following output: =====> openTempFile001(normal) 1048 of 2375 [0, 38, 0] cd ./lib/IO && '/usr/obj/ports/ghc-6.12.2/ghc-6.12.2/inplace/bin/ghc-stage2' -fforce-recomp -dcore-lint -dcmm-lint -no-user-package-conf -dno-debug-output -o openTempFile001 openTempFile001.hs >openTempFil e001.comp.stderr 2>&1 cd ./lib/IO && ./openTempFile001 </dev/null >openTempFile001.run.stdout 2>openTempFile001.run.stderr Wrong exit code (expected 0 , actual 1 ) Stdout: Stderr: openTempFile001: ./test22236.txt: hClose: invalid argument (Illegal byte sequence) *** unexpected failure for openTempFile001(normal) > Can > you elaborate further on what exactly the issue is with OpenBSD's > locale support? In particular, there's several components used by > Haskeline: > - call set_locale(LC_CTYPE) Problem number 1: set_locale(LC_CTYPE) fails (i.e. returns NULL) for any locale except `C` or `POSIX'. Did I mention that OpenBSD is really bad with locales? ;-) > - call nl_langinfo(CODESET) Always returns `646' (ASCII). Duh. > - pass the resulting string (which should be, e.g., $LANG) to iconv_open iconv_open appears to need the *codeset* name, not a complete locale. Note that OpenBSD uses GNU libiconv-1.13, which AFAIK differs from the one included in glibc. Even worse, I have to pass something like "UTF-8", whereas "UTF8" doesn't work. > - call iconv on user input (which may be malformed) I wrote a little C program that does the following (some error checks omitted here): char *inp, &outp; size_t insz, outsz; unsigned char in[] = {0xa9, 0, 0, 0}; char out[512]; inp = in; outp = out; insz = sizeof(in); outsz = sizeof(out) - 1; setlocale(LC_CTYPE, ""); ic = iconv_open("", "UTF-32LE"); if (iconv(ic, &inp, &insz, &outp, &outsz) == -1) { ... bail out (perror() etc.) ... } iconv_close(ic); *outp = 0; puts(out); And it just doesn't work, regardless what I set LC_CTYPE to. The only way to get it printing the copyright symbol is to explicitely use "UTF-8" (or "ISO-8859-1" or something else that knows about that symbol) as the first argument to iconv_open(). > Is the problem that setting $LC_ALL or $LANG has no effect on the > string returned by nl_langinfo, so the translation fails? Yes, see above. > If so, > haskeline is supposed to output "?"s in that case, so there might be a > bug in the package. It fails (or rather: ghci fails, since I didn't yet do any separate haskeline tests) with the same error as the test mentioned above, with the difference that it fails on hPutChar instead of hClose for obvious reasons. > Finally, when you say you have to "hard-code the character set", are > you talking about ghc, haskeline, the base library, or somewhere else? I'm talking about libraries/base/GHC/IO/Encoding/Iconv.hs See? There just is no non-hackerish way to fix this (except of course improving locale support on OpenBSD, but that's beyond my scope currently). Ciao, Kili _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users