On Mon, Mar 19, 2012 at 4:34 AM, Simon Marlow <simon...@microsoft.com> wrote: >> On Fri, Mar 16, 2012 at 6:49 PM, Ian Lynagh <ig...@earth.li> wrote: >> > Hi Gaby, >> > >> > On Fri, Mar 16, 2012 at 06:29:24PM -0500, Gabriel Dos Reis wrote: >> >> >> >> OK, thanks! I guess a take away from this discussion is that what is >> >> a punctuation is far less well defined than it appears... >> > >> > I'm not really sure what you're asking. Haskell's uniSymbol includes >> > all Unicode characters (should that be codepoints? I'm not a Unicode >> > expert) in the punctuation category; I'm not sure what the best >> > reference is, but e.g. table 12 in >> > http://www.unicode.org/reports/tr44/tr44-8.html#Property_Values >> > lists a number of Px categories, and a meta-category P "Punctuation". >> > >> > >> > Thanks >> > Ian >> > >> >> Hi Ian, >> >> I guess what I am asking was partly summarized in Iavor's message. >> >> For me, the issue started with bullet number 4 in section 1.1 >> >> http://www.haskell.org/onlinereport/intro.html#sect1.1 >> >> which states that: >> >> The lexical structure captures the concrete representation >> of Haskell programs in text files. >> >> That combined with the opening section 2.1 (e.g. example of terminal >> syntax) and the fact that the grammar routinely described two non- >> terminals ascXXX (for ASCII characters) and uniXXX for (Unicode character) >> suggested that the concrete syntax of Haskell programs in text files is in >> ASCII charset. Note this does not conflict with the general statement >> that Haskell programs use the Unicode character because the uniXXX could >> use the ASCII charset to introduce Unicode characters -- this is not >> uncommon practice for programming languages using Unicode characters; see >> the link I gave earlier. >> >> However, if I understand Malcolm's message correctly, this is not the >> case. >> Contrary to what I quoted above, Chapter 2 does NOT specify the concrete >> representation of Haskell programs in text files. What it does is to >> capture the structure of what is obtained from interpreting, *in some >> unspecified encoding or unspecified alphabet*, the concrete >> representation of Haskell programs in text files. This conclusion is >> unfortunate, but I believe it is correct. >> Since the encoding or the alphabet is unspecified, it is no longer >> necessarily the case that two Haskell implementations would agree on the >> same lexical interpretation when presented with the same exact text file >> containing a Haskell program. >> >> In its current form, you are correct that the Report should say >> "codepoint" >> instead of characters. >> >> I join Iavor's request in clarifying the alphabet used in the grammar. > > The report gives meaning to a sequence of codepoints only, it says nothing > about how that sequence of codepoints is represented as a string of bytes in > a file, nor does it say anything about what those files are called, or even > whether there are files at all.
Thanks, Simon. The fact that the Report is silent about encoding used to represent concrete Haskell programs in text files adds a certain level of non-portability (and confusion.) I found last night that a proposal has been made to add some support for encoding specification http://hackage.haskell.org/trac/haskell-prime/wiki/UnicodeInHaskellSource I believe that is a good start. What are the odds of it being considered for Haskell 2012? I suspect the pragma proposal works only if something is said about the position of that pragma in the source file (e.g. it must be the first line, or file N bytes in the source file) otherwise we have an infinite descent. > > Perhaps some clarification is in order in a future revision, and we should > use the correct terminology where appropriate. We should also clarify that > "punctuation" means exactly the Punctuation class. That would be great. Do you have any comment about the UnicodeInHaskellSource proposal? > With regards to normalisation and equivalence, my understanding is that > Haskell does not support either: two identifiers are equal if and only if > they are represented by the same sequence of codepoints. Again, we could add > a clarifying sentence to the report. > Ugh. Writing a parser for Haskell was an interesting exercise :-) -- Gaby _______________________________________________ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime