Glynn Clements <[EMAIL PROTECTED]> writes: > And it isn't a theoretical issue. E.g. in an environment where EUC-JP > is used, filenames may begin with <ESC>$)B (designate JISX0208 to G1), > or they may not (because G1 is assumed to contain JISX0208 initally).
I think such encodings are never used as default encodings of a Unix locale. >> The various UTF encodings do not have this particular problem; if a UTF >> string is valid, then it is a unique representation of a unicode string. BOM is a problem. Unfortunately Unicode mandates that FEFF at the start of a UTF-8 text stream is a mark which doesn't belong to the text. It provides variants of UTF-16/32 with and without a BOM, but UTF-8 only has the variant with a BOM. This makes UTF-8 a stateful encoding. Unix ignores this, it doesn't use BOM in UTF-8 except individual applications for individual file formats. iconv() on Linux and in libiconv don't process a BOM in UTF-8 (although in libiconv this is because it's old, basing on and old RFC with 31-bit code points which didn't include a BOM). -- __("< Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe