> How safe is representinging Unicode characters as Chars unsafeCoerce#d
> from large Ints? Seems to work in simple cases :-)
er, "downright dangerous". There are lots of places where we assume that
Chars have only 8 bits of data, even though the representation has room for
32. eg. the Char primitives all use StgChar (unsigned char), and the RTS
has a fixed table containing all 256 Char constants (to avoid duplicating
them in the heap).
You probably want to use Word32 or something for Unicode characters.
OTOH, it wouldn't be hard to change GHC's Char datatype to be a full 32-bit
integral data type.
Cheers,
Simon