Re: [Haskell-cafe] Re: Strings and utf-8

Reinier Lamers Thu, 29 Nov 2007 08:07:49 -0800

Thomas Hartman wrote:

A translation of

http://www.ahinea.com/en/tech/perl-unicode-struggle.html
from perl to haskell would be a very useful piece of documentation, Ithink.

Perl encodes both Unicode and binary data as the same (dynamic) datatype. Haskell - at least in theory - has two different types for them,namely [Char] for characters and [Word8] or ByteString for sequences ofbytes. I think the Haskell approach is better, because the programmer inmost cases knows whether he wants to treat his data as characters or asbytes. Perl does it the Perlish "We guess at what the coder means" way,which leads to a lot of frustration when Perl guesses wrong.

The problems of the Haskeller trying to use Unicode, I think, will bedifferent from those of the Perl hacker trying to use Unicode: theHaskeller will have to search for third-party modules to do what hewants, and finding those modules is the problem. The Perl hacker has allthe Unicode support built in, but has to fight Perl occasionally to keepit from doing byte operations on his Unicode data.

I had a colleague here go all but insane last week trying to use 'split'on a Unicode string in Perl on Windows. split would break the string inthe middle of a UTF-8 wide character, crashing UTF-8 processing later on.


Reinier
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Strings and utf-8

Reply via email to