A big thank-you to Max for pushing this change through. Not just a question of hacking, but also running a discussion about the spec and establishing a consensus in a rather complex area. Thank you and well done!
Simon | -----Original Message----- | From: [email protected] [mailto:[email protected]] On | Behalf Of Max Bolingbroke | Sent: 14 May 2011 23:06 | To: [email protected] | Subject: [commit: base] master: Big patch to improve Unicode support in GHC. | Validated on OS X and Windows, this (509f28c) | | Repository : ssh://darcs.haskell.org//srv/darcs/packages/base | | On branch : master | | http://hackage.haskell.org/trac/ghc/changeset/509f28cc93b980d30aca37008cbe66c677a0d6f | 6 | | >--------------------------------------------------------------- | | commit 509f28cc93b980d30aca37008cbe66c677a0d6f6 | Author: Max Bolingbroke <[email protected]> | Date: Sat May 14 22:50:46 2011 +0100 | | Big patch to improve Unicode support in GHC. Validated on OS X and Windows, this | patch series fixes #5061, #1414, #3309, #3308, #3307, #4006 and #4855. | | The major changes are: | | 1) Make Foreign.C.String.*CString use the locale encoding | | This change follows the FFI specification in Haskell 98, which | has never actually been implemented before. | | The functions exported from Foreign.C.String are partially-applied | versions of those from GHC.Foreign, which allows the user to supply | their own TextEncoding. | | We also introduce foreignEncoding as the name of the text encoding | that follows the FFI appendix in that it transliterates encoding | errors. | | 2) I also changed the code so that mkTextEncoding always tries the | native-Haskell decoders in preference to those from iconv, even on | non-Windows. The motivation here is simply that it is better for | compatibility if we do this, and those are the ones you get for | the utf* and latin1* predefined TextEncodings anyway. | | 3) Implement surrogate-byte error handling mode for TextEncoding | | This implements PEP383-like behaviour so that we are able to | roundtrip byte strings through Strings without loss of information. | | The withFilePath function now uses this encoding to get to/from CStrings, | so any code that uses that will get the right PEP383 behaviour automatically. | | 4) Implement three other coding failure modes: ignore, throw error, | transliterate | | These mimic the behaviour of the GNU Iconv extensions. | | Control/Exception/Base.hs | 2 +- | Foreign/C/String.hs | 44 +++++++- | GHC/Conc/Windows.hs | 16 +-- | GHC/Environment.hs | 36 +++++- | GHC/Foreign.hs | 255 ++++++++++++++++++++++++++++++++++++++++ | GHC/IO.hs | 14 ++- | GHC/IO/Encoding.hs | 78 +++++++++---- | GHC/IO/Encoding.hs-boot | 6 + | GHC/IO/Encoding/CodePage.hs | 96 ++++++++------- | GHC/IO/Encoding/Failure.hs | 129 ++++++++++++++++++++ | GHC/IO/Encoding/Iconv.hs | 114 ++++++------------ | GHC/IO/Encoding/Latin1.hs | 77 +++++++------ | GHC/IO/Encoding/Types.hs | 37 ++++-- | GHC/IO/Encoding/UTF16.hs | 149 ++++++++++++------------ | GHC/IO/Encoding/UTF32.hs | 146 +++++++++++++---------- | GHC/IO/Encoding/UTF8.hs | 101 +++++++++------- | GHC/IO/FD.hs | 11 +-- | GHC/IO/Handle/Internals.hs | 42 ++++++- | GHC/Windows.hs | 44 +++++++ | System/Environment.hs | 219 ++++++++++++++++++++++++++++------- | System/IO.hs | 2 +- | System/Posix/Internals.hs | 32 +++++ | System/Posix/Internals.hs-boot | 7 + | base.cabal | 3 + | 24 files changed, 1214 insertions(+), 446 deletions(-) | | | Diff suppressed because of size. To see it, use: | | git show 509f28cc93b980d30aca37008cbe66c677a0d6f6 | | _______________________________________________ | Cvs-libraries mailing list | [email protected] | http://www.haskell.org/mailman/listinfo/cvs-libraries _______________________________________________ Cvs-libraries mailing list [email protected] http://www.haskell.org/mailman/listinfo/cvs-libraries
