Ian Lynagh wrote:
On Tue, Jan 22, 2008 at 03:16:15PM +0000, Magnus Therning wrote:
On 1/22/08, Duncan Coutts <[EMAIL PROTECTED]> wrote:
On Tue, 2008-01-22 at 09:29 +0000, Magnus Therning wrote:
I vaguely remember that in GHC 6.6 code like this
length $ map ord "a string"
being able able to generate a different answer than
length "a string"
That seems unlikely.
Unlikely yes, yet I get the following in GHCi (ghc 6.6.1, the version
currently in Debian Sid):
map ord "a"
[97]
map ord "ö"
[195,182]
In 6.6.1:
Prelude Data.Char> map ord "ö"
[195,182]
Prelude Data.Char> length "ö"
2
there are actually 2 bytes there, but your terminal is showing them as
one character.
Still, that seems weird to me. A Haskell Char is a Unicode character. An
"ö" is either one character (unicode point 0xF6) (which, in UTF-8, is
coded as two bytes) or a combination of an "o" with an umlaut (Unicode
point 776). But because the last character is not 776, the "ö" here
should just be one character. I'd suspect that the two-character string
comes from the terminal speaking UTF-8 to GHC expecting Latin-1. GHC 6.8
expects UTF-8, so all is fine.
On my MacBook (OS X 10.4), 'ö' also immediately expands to "\303\266"
when I type it in my terminal, even outside GHCi. That suggests that the
terminal program doesn't handle Unicode and immediately escapes weird
characters.
Regards,
Reinier
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe