Thomas Hartman wrote:


A translation of

http://www.ahinea.com/en/tech/perl-unicode-struggle.html

from perl to haskell would be a very useful piece of documentation, I think.

Perl encodes both Unicode and binary data as the same (dynamic) data type. Haskell - at least in theory - has two different types for them, namely [Char] for characters and [Word8] or ByteString for sequences of bytes. I think the Haskell approach is better, because the programmer in most cases knows whether he wants to treat his data as characters or as bytes. Perl does it the Perlish "We guess at what the coder means" way, which leads to a lot of frustration when Perl guesses wrong.

The problems of the Haskeller trying to use Unicode, I think, will be different from those of the Perl hacker trying to use Unicode: the Haskeller will have to search for third-party modules to do what he wants, and finding those modules is the problem. The Perl hacker has all the Unicode support built in, but has to fight Perl occasionally to keep it from doing byte operations on his Unicode data.

I had a colleague here go all but insane last week trying to use 'split' on a Unicode string in Perl on Windows. split would break the string in the middle of a UTF-8 wide character, crashing UTF-8 processing later on.

Reinier
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to