Re: [Haskell-cafe] File path programme

Marcin 'Qrczak' Kowalczyk Sun, 30 Jan 2005 12:30:25 -0800

Glynn Clements <[EMAIL PROTECTED]> writes:

> And it isn't a theoretical issue. E.g. in an environment where EUC-JP
> is used, filenames may begin with <ESC>$)B (designate JISX0208 to G1),
> or they may not (because G1 is assumed to contain JISX0208 initally).


I think such encodings are never used as default encodings of a Unix
locale.

>> The various UTF encodings do not have this particular problem; if a UTF 
>> string is valid, then it is a unique representation of a unicode string.

BOM is a problem. Unfortunately Unicode mandates that FEFF at the
start of a UTF-8 text stream is a mark which doesn't belong to the
text. It provides variants of UTF-16/32 with and without a BOM, but
UTF-8 only has the variant with a BOM. This makes UTF-8 a stateful
encoding.

Unix ignores this, it doesn't use BOM in UTF-8 except individual
applications for individual file formats. iconv() on Linux and
in libiconv don't process a BOM in UTF-8 (although in libiconv this
is because it's old, basing on and old RFC with 31-bit code points
which didn't include a BOM).

-- 
   __("<         Marcin Kowalczyk
   \__/       [EMAIL PROTECTED]
    ^^     http://qrnik.knm.org.pl/~qrczak/
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] File path programme

Reply via email to