Re: [Haskell-cafe] invalid character encoding

Wolfgang Thaller Fri, 18 Mar 2005 22:10:21 -0800

Glynn Clements wrote:

OK, so the intermediate string will be nonsense if ISO-8859-1 isn't
the correct encoding, but that doesn't actually matter a lot of the
time; frequently, you're just grabbing a "blob" of data from one
function and passing it to another.

Yes. Of course, this also means that Strings representing non-ASCII filenames will *always* be nonsense on Mac OS X and other UTF8-based platforms.

The problems will only appear once you start dealing with fallible or
non-reversible encodings such as UTF-8 or ISO-2022.

In what way is ISO-2022 non-reversible? Is it possible that a ISO-2022 file name that is converted to Unicode cannot be converted back any more (assuming you know for sure that it was ISO-2022 in the first place)?

Of course, it's quite possible that the only test cases will be people
using UTF-8-only (or even ASCII-only) systems, in which case you won't
see any problems.

I'm kind of hoping that we can just ignore a problem that is so rare that a large and well-known project like GTK2 can get away with ignoring it. Also, IIRC, Java strings are supposed to be unicode, too - how do they deal with the problem?

So we can't do Unicode-based I18N because there exist a few unix
systems with messed-up file systems?


Declaring such systems to be "messed up" won't make the problems go
away. If a design doesn't work in reality, it's the fault of the
design, not of reality.

In general, yes. But we're not talking about all of reality here, we're talking about one small part of reality - the question is, can the part of reality where the design doesn't work be ignored?

For example, as soon as we use any kind of path names in our APIs, we are ignoring reality on good old "Classic" Mac OS (may it rest in piece). Path names don't always uniquely denote a file there (although they do most of the time). People writing cross-platform software have been ignoring this fact for a long time now.

I think that if we wait long enough, the filename encoding problems will become irrelevant and we will live in an ideal world where unicode actually works. Maybe next year, maybe only in ten years. And while we are arguing about how far we are from that ideal world, we should think about alternatives. The current hack is really just a hack, and I don't want to see this hack become the new accepted standard.

Do we have other alternatives? Preferably something that provides other advantages over a unicode String than just making things work on systems that many users never encounter, otherwise almost no one will bother to use it. So maybe we should start looking for _other_ reasons to represent file names and paths by an abstract datatype or something?

Cheers,

Wolfgang

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] invalid character encoding

Reply via email to