Marcin 'Qrczak' Kowalczyk wrote: > But the default encoding should > come from the locale instead of being ISO-8859-1.
The problem with that is that, if the locale's encoding is UTF-8, a lot of stuff is going to break (i.e. anything in ISO-8859-* which isn't limited to the 7-bit ASCII subset). The advantage of assuming ISO-8859-* is that the decoder can't fail; every possible stream of bytes is valid. This isn't the case for UTF-8. The advantage of ISO-8859-1 in particular is that it's trivial to convert the string back into the bytes which were actually read. The key problem with using the locale is that you frequently encounter files which aren't in the locale's encoding, and for which the encoding can't easily be deduced. If you assume ISO-8859-*, you can at least read them in, manipulate the contents (in any way that doesn't require interpreting any non-ASCII characters), and write out the results. OTOH, if you assume UTF-8 (e.g. because that happens to be the locale's encoding), the decoder is likely to abort shortly after the first non-ASCII character it finds (either that, or it will just silently drop characters). -- Glynn Clements <[EMAIL PROTECTED]> _______________________________________________ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
