On 1/22/08, Ian Lynagh [EMAIL PROTECTED] wrote:
On Tue, Jan 22, 2008 at 03:59:24PM +, Magnus Therning wrote:
Yes, of course, stupid me. But it is still the UTF-8 representation of
ö,
not Latin-1, and this brings me back to my original question, is this an
intentional change in 6.8?
Peter Verswyvelen [EMAIL PROTECTED] writes:
Prelude Data.Char map ord ö
[195,182]
Prelude Data.Char length ö
2
there are actually 2 bytes there, but your terminal is showing them as
one character.
So let's all switch to unicode ASAP and leave that horrible
multi-byte-string-thing behind
Ketil Malde wrote:
So let's all switch to unicode ASAP and leave that horrible
multi-byte-string-thing behind us?
You are being ironic, I take it?
No I just used wrong terminology. When I said unicode, I actually meant
UCS-x, and with multi-byte-string-thing I meant VARIABLE-length,
Peter Verswyvelen wrote:
Now I'm getting a bit confused here. To summarize, what encoding does
GHC 6.8.2 use for [Char]? UCS-32?
How dare you! Such a personal question! This is none of your business.
I jest, but the point is sound: the internal storage of Char is ghc's
business, and it
Peter Verswyvelen [EMAIL PROTECTED] writes:
No I just used wrong terminology. When I said unicode, I actually meant UCS-x,
You might as well say UCS-4, nobody uses UCS-2 anymore. It's been
replaced by UTF-16, which gives you the complexity of UTF-8 without
being compact (for 99% of existing
On Jan 23, 2008 11:56 AM, Jules Bean [EMAIL PROTECTED] wrote:
Peter Verswyvelen wrote:
Now I'm getting a bit confused here. To summarize, what encoding does
GHC 6.8.2 use for [Char]? UCS-32?
[snip]
What *does* matter to the programmer is what encodings putStr and
getLine use. AFAIK,
Johan Tibell wrote:
On Jan 23, 2008 11:56 AM, Jules Bean [EMAIL PROTECTED] wrote:
Peter Verswyvelen wrote:
Now I'm getting a bit confused here. To summarize, what encoding does
GHC 6.8.2 use for [Char]? UCS-32?
[snip]
What *does* matter to the programmer is what encodings putStr and
getLine
On Jan 23, 2008 12:13 PM, Jules Bean [EMAIL PROTECTED] wrote:
Presumably there wasn't a sufficiently good answer available in time for
haskell98.
Will there be one for haskell prime ?
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
What *does* matter to the programmer is what encodings putStr and
getLine use. AFAIK, they use lower 8 bits of unicode code point which
is almost functionally equivalent to latin-1.
Which is terrible! You should have to be explicit about what encoding
you expect. Python 3000
Johan Tibell wrote:
What *does* matter to the programmer is what encodings putStr and
getLine use. AFAIK, they use lower 8 bits of unicode code point which
is almost functionally equivalent to latin-1.
Which is terrible! You should have to be explicit about what encoding
you expect. Python 3000
The benefit would be that if the input is not in latin-1 an exception
could be thrown rather than returning a Char representing the wrong
Unicode code point.
I'm not sure what you mean here. All 256 possible values have a meaning.
You're of course right. So we don't have a problem here.
On 1/23/08, Johan Tibell [EMAIL PROTECTED] wrote:
[..]
My proposal is for I/O functions to specify the encoding they use if
they accept or return Chars (and Strings). If they deal in terms of
bytes (e.g. socket functions) they should accept and return Word8s.
Optionally, text I/O functions
Johan Tibell wrote:
What *does* matter to the programmer is what encodings putStr and
getLine use. AFAIK, they use lower 8 bits of unicode code point which
is almost functionally equivalent to latin-1.
Which is terrible! You should have to be explicit about what encoding
you expect.
Johan Tibell [EMAIL PROTECTED] writes:
The benefit would be that if the input is not in latin-1 an exception
could be thrown rather than returning a Char representing the wrong
Unicode code point.
I'm not sure what you mean here. All 256 possible values have a meaning.
OTOH, going the other
On Jan 23, 2008 2:11 PM, Magnus Therning [EMAIL PROTECTED] wrote:
Yes, this reflects my recent experience, Char is not a good representation
for an 8-bit byte. This thread came out of my attempt to add a module to
dataenc[1] that would make base64-string[2] obsolete. As you probably can
I vaguely remember that in GHC 6.6 code like this
length $ map ord a string
being able able to generate a different answer than
length a string
At the time I thought that the encoding (in my case UTF-8) was “leaking
through”. After switching to GHC 6.8 the behaviour seems to have
changed,
chr . ord $ 'å'
'\229'
What would I have to do to get an 'å' from '229'?
It seems you already have it; 'å' is the same as '\229'. But IO output is still
8-bit, so when you ask ghci to print 'å', you get '\229'. You can use
utf-string library (from hackage).
2008/1/22 Magnus Therning [EMAIL PROTECTED]:
I vaguely remember that in GHC 6.6 code like this
length $ map ord a string
being able able to generate a different answer than
length a string
I guess it's not very difficult to prove that
∀ f xs. length xs == length (map f xs)
even
On Tue, 2008-01-22 at 09:29 +, Magnus Therning wrote:
I vaguely remember that in GHC 6.6 code like this
length $ map ord a string
being able able to generate a different answer than
length a string
That seems unlikely.
At the time I thought that the encoding (in my case
On Tue, 2008-01-22 at 12:56 +0300, Miguel Mitrofanov wrote:
chr . ord $ 'å'
'\229'
What would I have to do to get an 'å' from '229'?
It seems you already have it; 'å' is the same as '\229'.
Yes.
But IO output is still 8-bit, so when you ask ghci to print 'å', you get
'\229'.
On Tue, 22 Jan 2008, Duncan Coutts wrote:
At the time I thought that the encoding (in my case UTF-8) was “leaking
through”. After switching to GHC 6.8 the behaviour seems to have
changed, and mapping 'ord' on a string results in a list of ints
representing the Unicode code point rather
On Tue, 2008-01-22 at 13:48 +0100, Henning Thielemann wrote:
On Tue, 22 Jan 2008, Duncan Coutts wrote:
At the time I thought that the encoding (in my case UTF-8) was “leaking
through”. After switching to GHC 6.8 the behaviour seems to have
changed, and mapping 'ord' on a string
On 1/22/08, Duncan Coutts [EMAIL PROTECTED] wrote:
On Tue, 2008-01-22 at 09:29 +, Magnus Therning wrote:
I vaguely remember that in GHC 6.6 code like this
length $ map ord a string
being able able to generate a different answer than
length a string
That seems unlikely.
On Tue, Jan 22, 2008 at 03:16:15PM +, Magnus Therning wrote:
On 1/22/08, Duncan Coutts [EMAIL PROTECTED] wrote:
On Tue, 2008-01-22 at 09:29 +, Magnus Therning wrote:
I vaguely remember that in GHC 6.6 code like this
length $ map ord a string
being able able to
Hello Duncan,
Tuesday, January 22, 2008, 1:36:44 PM, you wrote:
Yes. GHC 6.8 treats .hs files as UTF-8 where it previously treated them
as Latin-1.
afair, it was changed since 6.6
--
Best regards,
Bulatmailto:[EMAIL PROTECTED]
Ian Lynagh wrote:
On Tue, Jan 22, 2008 at 03:16:15PM +, Magnus Therning wrote:
On 1/22/08, Duncan Coutts [EMAIL PROTECTED] wrote:
On Tue, 2008-01-22 at 09:29 +, Magnus Therning wrote:
I vaguely remember that in GHC 6.6 code like this
length $ map ord a string
being
Ian Lynagh wrote:
Prelude Data.Char map ord ö
[195,182]
Prelude Data.Char length ö
2
there are actually 2 bytes there, but your terminal is showing them as
one character.
So let's all switch to unicode ASAP and leave that horrible
multi-byte-string-thing behind us?
Cheers,
Peter
On 1/22/08, Ian Lynagh [EMAIL PROTECTED] wrote:
On Tue, Jan 22, 2008 at 03:16:15PM +, Magnus Therning wrote:
On 1/22/08, Duncan Coutts [EMAIL PROTECTED] wrote:
On Tue, 2008-01-22 at 09:29 +, Magnus Therning wrote:
I vaguely remember that in GHC 6.6 code like this
Magnus Therning wrote:
Yes, of course, stupid me. But it is still the UTF-8 representation of
ö, not Latin-1, and this brings me back to my original question, is
this an intentional change in 6.8?
map ord ö
[246]
map ord åɓz퐀
[229,595,65370,119808]
6.8 produces Unicode code points
On Tue, 2008-01-22 at 07:45 -0200, Felipe Lessa wrote:
2008/1/22 Magnus Therning [EMAIL PROTECTED]:
I vaguely remember that in GHC 6.6 code like this
length $ map ord a string
being able able to generate a different answer than
length a string
I guess it's not very difficult
On Tue, Jan 22, 2008 at 03:59:24PM +, Magnus Therning wrote:
Yes, of course, stupid me. But it is still the UTF-8 representation of ö,
not Latin-1, and this brings me back to my original question, is this an
intentional change in 6.8?
Yes (in 6.8.2, to be precise).
It's in the release
31 matches
Mail list logo