Hello John, Friday, February 03, 2006, 3:39:38 AM, you wrote: >> Got a unicode-compliant compiler?
JM> sure do :) JM> but it currently doesn't recognize any unicode characters as possible JM> operators. are you read this? :) > Log: > Add support for UTF-8 source files > > GHC finally has support for full Unicode in source files. Source > files are now assumed to be UTF-8 encoded, and the full range of > Unicode characters can be used, with classifications recognised using > the implementation from Data.Char. This incedentally means that only > the stage2 compiler will recognise Unicode in source files, because I > was too lazy to port the unicode classifier code into libcompat. > > Additionally, the following synonyms for keywords are now recognised: > > forall symbol (U+2200) forall > right arrow (U+2192) -> > left arrow (U+2190) <- > horizontal ellipsis (U+22EF) .. > > there are probably more things we could add here. > > This will break some source files if Latin-1 characters are being used. > In most cases this should result in a UTF-8 decoding error. Later on > if we want to support more encodings (perhaps with a pragma to specify > the encoding), I plan to do it by recoding into UTF-8 before parsing. > > Internally, there were some pretty big changes: > > - FastStrings are now stored in UTF-8 > > - Z-encoding has been moved right to the back end. Previously we > used to Z-encode every identifier on the way in for simplicity, > and only decode when we needed to show something to the user. > Instead, we now keep every string in its UTF-8 encoding, and > Z-encode right before printing it out. To avoid Z-encoding the > same string multiple times, the Z-encoding is cached inside the > FastString the first time it is requested. > > This speeds up the compiler - I've measured some definite > improvement in parsing at least, and I expect compilations overall > to be faster too. It also cleans up a lot of cruft from the > OccName interface. Z-encoding is nicely hidden inside the > Outputable instance for Names & OccNames now. > > - StringBuffers are UTF-8 too, and are now represented as > ForeignPtrs. > > - I've put together some test cases, not by any means exhaustive, > but there are some interesting UTF-8 decoding error cases that > aren't obvious. Also, take a look at unicode001.hs for a demo. -- Best regards, Bulat mailto:[EMAIL PROTECTED] _______________________________________________ Haskell-prime mailing list Haskell-prime@haskell.org http://haskell.org/mailman/listinfo/haskell-prime