On Tue, Mar 20, 2012 at 5:37 PM, Iavor Diatchki <iavor.diatc...@gmail.com> wrote: > Hello, > > So I looked at what GHC does with Unicode and to me it is seems quite > reasonable: > > * The alphabet is Unicode code points, so a valid Haskell program is > simply a list of those. > * Combining characters are not allowed in identifiers, so no need for > complex normalization rules: programs should always use the "short" > version of a character, or be rejected. > * Combining characters may appear in string literals, and there they > are left "as is" without any modification (so some string literals may > be longer than what's displayed in a text editor.) > > Perhaps this is simply what the report already states (I haven't > checked, for which I apologize) but, if not, perhaps we should clarify > things. > > -Iavor > PS: I don't think that there is any need to specify a particular > representation for the unicode code-points (e.g., utf-8 etc.) in the > language standard.
Thanks Iavor. If the report intended to talk about code points only (and indeed ruling out normalization suggests that), then the Report needs to be clarified. As you know, there is a distinction between a Unicode code point and a Unicode character http://www.unicode.org/versions/Unicode6.0.0/ch02.pdf#G25564 Until I sent my original query, I had been reading the Report as meaning Unicode characters (as the grammar seemed to suggest), but now it is clear to me that only code points were intended. That seemed to be confirmed by your investigation of the GHC code base. -- Gaby _______________________________________________ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime