Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-23 Thread Magnus Therning
On 1/22/08, Ian Lynagh [EMAIL PROTECTED] wrote: On Tue, Jan 22, 2008 at 03:59:24PM +, Magnus Therning wrote: Yes, of course, stupid me. But it is still the UTF-8 representation of ö, not Latin-1, and this brings me back to my original question, is this an intentional change in 6.8?

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-23 Thread Ketil Malde
Peter Verswyvelen [EMAIL PROTECTED] writes: Prelude Data.Char map ord ö [195,182] Prelude Data.Char length ö 2 there are actually 2 bytes there, but your terminal is showing them as one character. So let's all switch to unicode ASAP and leave that horrible multi-byte-string-thing behind

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-23 Thread Peter Verswyvelen
Ketil Malde wrote: So let's all switch to unicode ASAP and leave that horrible multi-byte-string-thing behind us? You are being ironic, I take it? No I just used wrong terminology. When I said unicode, I actually meant UCS-x, and with multi-byte-string-thing I meant VARIABLE-length,

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-23 Thread Jules Bean
Peter Verswyvelen wrote: Now I'm getting a bit confused here. To summarize, what encoding does GHC 6.8.2 use for [Char]? UCS-32? How dare you! Such a personal question! This is none of your business. I jest, but the point is sound: the internal storage of Char is ghc's business, and it

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-23 Thread Ketil Malde
Peter Verswyvelen [EMAIL PROTECTED] writes: No I just used wrong terminology. When I said unicode, I actually meant UCS-x, You might as well say UCS-4, nobody uses UCS-2 anymore. It's been replaced by UTF-16, which gives you the complexity of UTF-8 without being compact (for 99% of existing

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-23 Thread Johan Tibell
On Jan 23, 2008 11:56 AM, Jules Bean [EMAIL PROTECTED] wrote: Peter Verswyvelen wrote: Now I'm getting a bit confused here. To summarize, what encoding does GHC 6.8.2 use for [Char]? UCS-32? [snip] What *does* matter to the programmer is what encodings putStr and getLine use. AFAIK,

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-23 Thread Jules Bean
Johan Tibell wrote: On Jan 23, 2008 11:56 AM, Jules Bean [EMAIL PROTECTED] wrote: Peter Verswyvelen wrote: Now I'm getting a bit confused here. To summarize, what encoding does GHC 6.8.2 use for [Char]? UCS-32? [snip] What *does* matter to the programmer is what encodings putStr and getLine

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-23 Thread david48
On Jan 23, 2008 12:13 PM, Jules Bean [EMAIL PROTECTED] wrote: Presumably there wasn't a sufficiently good answer available in time for haskell98. Will there be one for haskell prime ? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-23 Thread Johan Tibell
What *does* matter to the programmer is what encodings putStr and getLine use. AFAIK, they use lower 8 bits of unicode code point which is almost functionally equivalent to latin-1. Which is terrible! You should have to be explicit about what encoding you expect. Python 3000

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-23 Thread Jules Bean
Johan Tibell wrote: What *does* matter to the programmer is what encodings putStr and getLine use. AFAIK, they use lower 8 bits of unicode code point which is almost functionally equivalent to latin-1. Which is terrible! You should have to be explicit about what encoding you expect. Python 3000

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-23 Thread Johan Tibell
The benefit would be that if the input is not in latin-1 an exception could be thrown rather than returning a Char representing the wrong Unicode code point. I'm not sure what you mean here. All 256 possible values have a meaning. You're of course right. So we don't have a problem here.

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-23 Thread Magnus Therning
On 1/23/08, Johan Tibell [EMAIL PROTECTED] wrote: [..] My proposal is for I/O functions to specify the encoding they use if they accept or return Chars (and Strings). If they deal in terms of bytes (e.g. socket functions) they should accept and return Word8s. Optionally, text I/O functions

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-23 Thread Reinier Lamers
Johan Tibell wrote: What *does* matter to the programmer is what encodings putStr and getLine use. AFAIK, they use lower 8 bits of unicode code point which is almost functionally equivalent to latin-1. Which is terrible! You should have to be explicit about what encoding you expect.

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-23 Thread Ketil Malde
Johan Tibell [EMAIL PROTECTED] writes: The benefit would be that if the input is not in latin-1 an exception could be thrown rather than returning a Char representing the wrong Unicode code point. I'm not sure what you mean here. All 256 possible values have a meaning. OTOH, going the other

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-23 Thread Johan Tibell
On Jan 23, 2008 2:11 PM, Magnus Therning [EMAIL PROTECTED] wrote: Yes, this reflects my recent experience, Char is not a good representation for an 8-bit byte. This thread came out of my attempt to add a module to dataenc[1] that would make base64-string[2] obsolete. As you probably can

[Haskell-cafe] Has character changed in GHC 6.8?

2008-01-22 Thread Magnus Therning
I vaguely remember that in GHC 6.6 code like this length $ map ord a string being able able to generate a different answer than length a string At the time I thought that the encoding (in my case UTF-8) was “leaking through”. After switching to GHC 6.8 the behaviour seems to have changed,

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-22 Thread Miguel Mitrofanov
chr . ord $ 'å' '\229' What would I have to do to get an 'å' from '229'? It seems you already have it; 'å' is the same as '\229'. But IO output is still 8-bit, so when you ask ghci to print 'å', you get '\229'. You can use utf-string library (from hackage).

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-22 Thread Felipe Lessa
2008/1/22 Magnus Therning [EMAIL PROTECTED]: I vaguely remember that in GHC 6.6 code like this length $ map ord a string being able able to generate a different answer than length a string I guess it's not very difficult to prove that ∀ f xs. length xs == length (map f xs) even

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-22 Thread Duncan Coutts
On Tue, 2008-01-22 at 09:29 +, Magnus Therning wrote: I vaguely remember that in GHC 6.6 code like this length $ map ord a string being able able to generate a different answer than length a string That seems unlikely. At the time I thought that the encoding (in my case

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-22 Thread Duncan Coutts
On Tue, 2008-01-22 at 12:56 +0300, Miguel Mitrofanov wrote: chr . ord $ 'å' '\229' What would I have to do to get an 'å' from '229'? It seems you already have it; 'å' is the same as '\229'. Yes. But IO output is still 8-bit, so when you ask ghci to print 'å', you get '\229'.

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-22 Thread Henning Thielemann
On Tue, 22 Jan 2008, Duncan Coutts wrote: At the time I thought that the encoding (in my case UTF-8) was “leaking through”. After switching to GHC 6.8 the behaviour seems to have changed, and mapping 'ord' on a string results in a list of ints representing the Unicode code point rather

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-22 Thread Duncan Coutts
On Tue, 2008-01-22 at 13:48 +0100, Henning Thielemann wrote: On Tue, 22 Jan 2008, Duncan Coutts wrote: At the time I thought that the encoding (in my case UTF-8) was “leaking through”. After switching to GHC 6.8 the behaviour seems to have changed, and mapping 'ord' on a string

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-22 Thread Magnus Therning
On 1/22/08, Duncan Coutts [EMAIL PROTECTED] wrote: On Tue, 2008-01-22 at 09:29 +, Magnus Therning wrote: I vaguely remember that in GHC 6.6 code like this length $ map ord a string being able able to generate a different answer than length a string That seems unlikely.

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-22 Thread Ian Lynagh
On Tue, Jan 22, 2008 at 03:16:15PM +, Magnus Therning wrote: On 1/22/08, Duncan Coutts [EMAIL PROTECTED] wrote: On Tue, 2008-01-22 at 09:29 +, Magnus Therning wrote: I vaguely remember that in GHC 6.6 code like this length $ map ord a string being able able to

Re[2]: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-22 Thread Bulat Ziganshin
Hello Duncan, Tuesday, January 22, 2008, 1:36:44 PM, you wrote: Yes. GHC 6.8 treats .hs files as UTF-8 where it previously treated them as Latin-1. afair, it was changed since 6.6 -- Best regards, Bulatmailto:[EMAIL PROTECTED]

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-22 Thread Reinier Lamers
Ian Lynagh wrote: On Tue, Jan 22, 2008 at 03:16:15PM +, Magnus Therning wrote: On 1/22/08, Duncan Coutts [EMAIL PROTECTED] wrote: On Tue, 2008-01-22 at 09:29 +, Magnus Therning wrote: I vaguely remember that in GHC 6.6 code like this length $ map ord a string being

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-22 Thread Peter Verswyvelen
Ian Lynagh wrote: Prelude Data.Char map ord ö [195,182] Prelude Data.Char length ö 2 there are actually 2 bytes there, but your terminal is showing them as one character. So let's all switch to unicode ASAP and leave that horrible multi-byte-string-thing behind us? Cheers, Peter

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-22 Thread Magnus Therning
On 1/22/08, Ian Lynagh [EMAIL PROTECTED] wrote: On Tue, Jan 22, 2008 at 03:16:15PM +, Magnus Therning wrote: On 1/22/08, Duncan Coutts [EMAIL PROTECTED] wrote: On Tue, 2008-01-22 at 09:29 +, Magnus Therning wrote: I vaguely remember that in GHC 6.6 code like this

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-22 Thread Jules Bean
Magnus Therning wrote: Yes, of course, stupid me. But it is still the UTF-8 representation of ö, not Latin-1, and this brings me back to my original question, is this an intentional change in 6.8? map ord ö [246] map ord åɓz퐀 [229,595,65370,119808] 6.8 produces Unicode code points

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-22 Thread Derek Elkins
On Tue, 2008-01-22 at 07:45 -0200, Felipe Lessa wrote: 2008/1/22 Magnus Therning [EMAIL PROTECTED]: I vaguely remember that in GHC 6.6 code like this length $ map ord a string being able able to generate a different answer than length a string I guess it's not very difficult

Re: [Haskell-cafe] Has character changed in GHC 6.8?

2008-01-22 Thread Ian Lynagh
On Tue, Jan 22, 2008 at 03:59:24PM +, Magnus Therning wrote: Yes, of course, stupid me. But it is still the UTF-8 representation of ö, not Latin-1, and this brings me back to my original question, is this an intentional change in 6.8? Yes (in 6.8.2, to be precise). It's in the release