Glynn Clements <[EMAIL PROTECTED]> writes: >> Ok, but let it be in addition to, not instead treating them as >> character strings. > > Provided that you know the encoding, nothing stops you converting > them to strings, should you have a need to do so.
There are already APIs which use Strings for filenames. I meant to keep them, let them use a program-settable encoding which defaults to the locale encoding - this is the only sane interpretation of this interface on Unix I can imagine. And in addition to them we may have APIs which use byte strings, for those who prefer the ability to handle all filenames to using a uniform string representation inside the program. >> Such encodings are not suitable for filenames. > > Regardless of whether they are "suitable", they are used. Usage of ISO-2022 as filename encoding is a bad and unsupported idea. The '/' byte does not necessarily mean that the '/' character is there, so some random subset of characters is excluded. statefulness means that the same filename may be interpreted as different characters depending on context. There is no need to support ISO-2022 as filename encoding in languages and tools. The fact that some tool doesn't support ISO-2022 in filenames is not a flaw in the tool, so there is no need to check what happens when filenames are represented in ISO-2022. If they are, someone should fix his system. > I haven't addressed any of the other stuff about ISO-2022, as it isn't > really relevant. Whether ISO-2022 is good or bad doesn't matter; what > matters is that it is likely to remain in use for the foreseeable > future. For transportation, not for the locale encoding nor for filenames. There are no ISO-2022 locales. A program may support it when data it operates on requests recoding between explicit encodings, e.g. if it's found in an email, but there is no need to support it as the default encoding of a program (which e.g. withCString function should use). >> IMHO it's more important to make them compatible with the >> representation of strings used in other parts of the program. > > Why? To limit conversion hassle to I/O, instead of scattering it through the program when filenames and other strings are met. >> But otherwise programs would continuously have bugs in handling text >> which is not ISO-8859-1, especially with multibyte encoding where >> pretending that ISO-8859-2 is ISO-8859-1 too often doesn't work. > > Why? Because some channels talk in terms of characters, or bytes in a known encoding, instead of bytes in an implicit encoding. E.g. most display channels, apart from raw stdin/stdout and narrow character ncurses; many Internet protocols, apart from irc; .NET and Java; file formats like XML; some databases. And the world is slowly shifting to have more such channels, which replace byte streams in an implicit encoding, because after reaching a critical mass (where encodingless channels don't get in the middle way, losing information about the encoding or losing some characters) they make miltilingual handling more robust. -- __("< Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ _______________________________________________ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe