Hi,

On Sun, Apr 18, 2010 at 10:53:22AM -0700, Judah Jacobson wrote:
> > Anyway, the short story is that I have to either hard-code the
> > character set to something like utf-8, or ghc will start to behave
> > really strange (for example, ghci would terminate immediately if
> > you just *type* a non-ASCII character).
> 
> That sounds like it might be something to do with the haskeline
> package, which ghci uses for user interaction.  Haskeline makes its
> own FFI calls to translate raw input bytes into Unicode Chars.

Oh, this may indeed be a second problem. However, the encoding
problem itself also manifests in the `openTempFile001' test of the
testsuite.  For example, with an unpatched ghc-6.12, the test fails
with the following output:

=====> openTempFile001(normal) 1048 of 2375 [0, 38, 0]
cd ./lib/IO && '/usr/obj/ports/ghc-6.12.2/ghc-6.12.2/inplace/bin/ghc-stage2' 
-fforce-recomp -dcore-lint -dcmm-lint -no-user-package-conf  -dno-debug-output 
-o openTempFile001 openTempFile001.hs    >openTempFil
e001.comp.stderr 2>&1
cd ./lib/IO && ./openTempFile001    </dev/null >openTempFile001.run.stdout 
2>openTempFile001.run.stderr
Wrong exit code (expected 0 , actual 1 )
Stdout:

Stderr:
openTempFile001: ./test22236.txt: hClose: invalid argument (Illegal byte 
sequence)

*** unexpected failure for openTempFile001(normal)


> Can
> you elaborate further on what exactly the issue is with OpenBSD's
> locale support?  In particular, there's several components used by
> Haskeline:
>  - call set_locale(LC_CTYPE)

Problem number 1: set_locale(LC_CTYPE) fails (i.e. returns NULL)
for any locale except `C` or `POSIX'. Did I mention that OpenBSD
is really bad with locales? ;-)

>  - call nl_langinfo(CODESET)

Always returns `646' (ASCII). Duh.

>  - pass the resulting string (which should be, e.g., $LANG) to iconv_open

iconv_open appears to need the *codeset* name, not a complete locale.
Note that OpenBSD uses GNU libiconv-1.13, which AFAIK differs from
the one included in glibc. Even worse, I have to pass something
like "UTF-8", whereas "UTF8" doesn't work.

>  - call iconv on user input (which may be malformed)

I wrote a little C program that does the following (some error
checks omitted here):

        char *inp, &outp;
        size_t insz, outsz;
        unsigned char in[] = {0xa9, 0, 0, 0};
        char out[512];

        inp = in;
        outp = out;
        insz = sizeof(in);
        outsz = sizeof(out) - 1;
        setlocale(LC_CTYPE, "");
        ic = iconv_open("", "UTF-32LE");
        if (iconv(ic, &inp, &insz, &outp, &outsz) == -1) {
                ... bail out (perror() etc.) ...
        }
        iconv_close(ic);
        *outp = 0;
        puts(out);

And it just doesn't work, regardless what I set LC_CTYPE to. The
only way to get it printing the copyright symbol is to explicitely
use "UTF-8" (or "ISO-8859-1" or something else that knows about
that symbol) as the first argument to iconv_open().

> Is the problem that setting $LC_ALL or $LANG has no effect on the
> string returned by nl_langinfo, so the translation fails?

Yes, see above.

> If so,
> haskeline is supposed to output "?"s in that case, so there might be a
> bug in the package.

It fails (or rather: ghci fails, since I didn't yet do any separate
haskeline tests) with the same error as the test mentioned above,
with the difference that it fails on hPutChar instead of hClose for
obvious reasons.

> Finally, when you say you have to "hard-code the character set", are
> you talking about ghc, haskeline, the base library, or somewhere else?

I'm talking about libraries/base/GHC/IO/Encoding/Iconv.hs

See? There just is no non-hackerish way to fix this (except of
course improving locale support on OpenBSD, but that's beyond my
scope currently).

Ciao,
        Kili
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Reply via email to