On 06/11/2011 16:56, John Millikin wrote:
2011/11/6 Max Bolingbroke<batterseapo...@hotmail.com>:
On 6 November 2011 04:14, John Millikin<jmilli...@gmail.com>  wrote:
For what it's worth, on my Ubuntu system, Nautilus ignores the locale
and just treats all paths as either UTF8 or invalid.
To me, this seems like the most reasonable option; the concept of
"locale encoding" is entirely vestigal, and should only be used in
certain specialized cases.

Unfortunately non-UTF8 locale encodings are seen in practice quite
often. I'm not sure about Linux, but certainly lots of Windows systems
are configured with a locale encoding like GBK or Big5.

This doesn't really matter for file paths, though. The Win32 file API
uses wide-character functions, which ought to work with Unicode text
regardless of what the user set their locale to.

Paths as text is what *Windows* programmers expect. Paths as bytes is
what's expected by programmers on non-Windows OSes, including Linux
and OS X.

IIRC paths on OS X are guaranteed to be valid UTF-8. The only platform
that uses bytes for paths (that we care about) is Linux.

UTF-8 is bytes. It can be treated as text in some cases, but it's
better to think about it as bytes.

I'm not saying one is inherently better than the other, but
considering that various UNIX  and UNIX-like operating systems have
been using byte-based paths for near on forty years now, trying to
abolish them by redefining the type is not a useful action.

We have to:
  1. Provide an API that makes sense on all our supported OSes
  2. Have getArgs :: IO [String]
  3. Have it such that if you go to your console and write
(./MyHaskellProgram 你好) then getArgs tells you ["你好"]

Given these constraints I don't see any alternative to PEP-383 behaviour.

Requirement #1 directly contradicts #2 and #3.

If you're going to make all the System.IO stuff use text, at least
give us an escape hatch. The "unix" package is ideally suited, as it's
already inherently OS-specific. Something like this would be perfect:

You can already do this with the implemented design. We have:

openFile :: FilePath ->  IO Handle

The FilePath will be encoded in the fileSystemEncoding. On Unix this
will have PEP383 roundtripping behaviour. So if you want openFile' ::
[Byte] ->  IO Handle you can write something like this:

escape = map (\b ->  if b<  128 then chr b else chr (0xEF00 + b))
openFile = openFile' . escape

The bytes that reach the API call will be exactly the ones you supply.
(You can also implement "escape" by just encoding the [Byte] with the
fileSystemEncoding).

Likewise, if you have a String and want to get the [Byte] we decoded
it from, you just need to encode the String again with the
fileSystemEncoding.

If this is not enough for you please let me know, but it seems to me
that it covers all your use cases, without any need to reimplement the
FFI bindings.

This is not enough, since these strings are still being passed through
the potentially (and in 7.2.1, actually) broken path encoder.

I think you might be misunderstanding how the new API works. Basically, imagine a reversible transformation:

  encode :: String -> [Word8]
  decode :: [Word8] -> String

this transformation is applied in the appropriate direction by the IO library to translate filesystem paths into FilePath and vice versa. No information is lost; furthermore you can apply the transformation yourself in order to recover the original [Word8] from a String, or to inject your own [Word8] file path.

Ok?

All this does is mean that the common case where you want to interpret file system paths as text works with no fuss, without breaking anything in the case when the file system paths are not actually text.

It would probably be better to have an abstract FilePath type and to keep the original bytes, decoding on demand. But that is a big change to the API and would break much more code. One day we'll do this properly; for now we have this, which I think is a pretty reasonble compromise.

Cheers,
        Simon


_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Reply via email to