On 10/11/2011 09:28, Max Bolingbroke wrote:
Is there any consensus about what to do here? My take is that we
should move back to lone surrogates. This:
1. Recovers the roundtrip property, which we appear to believe is essential
2. Removes all the weird problems I outlined earlier that can occur
if your byte strings happen to contain some bytes that decode to
U+EFxx
3. DOES break software that expects Strings not to contain surrogate
codepoints, but (I agree with you) this is arguably a feature
This is also exactly what Python does so it has the advantage of being
battle tested.
Agreed?
Agreed.
We can additionally:
* Provide your layer in the "unix" package where FilePath =
ByteString, for people who for some reason care about performance of
their FilePath encoding/decoding, OR who don't want to rely on the
roundtripping property being implemented correctly
I think I'll do this anyway.
* Perhaps provide a layer in the "win32" package where FilePath =
ByteString but where that ByteString is guaranteed to be UTF-16
encoded (I'm less sure about this, because we can always unambiguously
decode this without doing any escaping. It's still useful if you care
about performance.)
I'm wondering if we should also have hSetLocaleEncoding,
hSetFileSystemEncoding :: TextEncoding -> IO () and change
localeEncoding, fileSystemEncoding :: IO TextEncoding.
hSetFileSystemEncoding in particular would let people opt-out of
escapes entirely as long as they issued it right at the start of their
program before the fileSystemEncoding had been used.
Ok by me.
Cheers,
Simon
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users