Re: Haskell 1.4 and Unicode

Lennart Augustsson Fri, 7 Nov 1997 12:54:17 +0100 (MET)

Unicode was added at the last moment, so there is likely to
be some descrepancies.

> 1) I assume that layout processing occurs after Unicode preprocessing;
> otherwise, you can't even find the lexemes.  If so, are all Unicode
> characters assumed to be the same width?
I think that's what is intended.

> However, it would also seem quite reasonable to include class Lo
> (which includes things like "Hebrew letter Alef") in UNIsmall or
> UNIlarge; and to include some of the Punctuation classes in UNIsymbol.
It's hard to put Lo in a sensible place since Haskell relies on
the upper/lower distinction.  Therefore Lo is not included
in upper or lower.

> 3) What does it mean that Char can include any Unicode character?
It means that within a Haskell program Char can hold a Unicode character.

> If I compile and run the following program on my vanilla American UNIX
> box:
> 
>       main = putChar '\x2473' {- print a "circled number twenty" -}
> 
> to get a program "ctwenty", and I run
> 
>       ./ctwenty | od -c
> 
> (od prints out each byte of output), what will I see?
> 
> Will the following program
> 
>       main = getChar >>= (print . fromEnum)
> 
> ever print out a number greater than 256?
The I/O library has not been converted to Unicode.  So I would
expect implementation to silently truncate Unicode characters
to 8 bits.

To do sensibly output (or input) of Unicode characters you need to
encode them somehow.  Hbc comes with encode/decode functions (in the
Char library) for three encodings: two bytes per Char, UTF-8, and the
Java encoding (\uXXXX).

        -- Lennart

dogbert% cat ctwenty.hs
import Char
main = putStr (encodeUnicode "\x2473")
dogbert% hbc ctwenty.hs -o ctwenty
dogbert% ./ctwenty | od -c
0000000    $   s                                                        
0000002

dogbert% cat ctwenty.hs
import Char
main = putStr (encodeUTF8 "\x2473")
dogbert% hbc ctwenty.hs -o ctwenty
dogbert% ./ctwenty | od -c
0000000  342 221 263                                                    
0000003

dogbert% cat ctwenty.hs
import Char
main = putStr (encodeEscape "\x2473")
dogbert% hbc ctwenty.hs -o ctwenty
dogbert% ./ctwenty | od -c
0000000    \   u   2   4   7   3                                        
0000006
Re: Haskell 1.4 and Unicode

Reply via email to