Re: RFC: Unicode primes and super/subscript characters in GHC

Mikhail Vorozhtsov Mon, 16 Jun 2014 09:42:12 -0700

On 06/16/2014 04:26 AM, Mateusz Kowalczyk wrote:

On 06/14/2014 04:48 PM, Mikhail Vorozhtsov wrote:

Hello lists,


As some of you may know, GHC's support for Unicode characters in lexemes
is rather crude and hence prone to inconsistencies in their handling
versus the ASCII counterparts. For example, APOSTROPHE is treated
differently from PRIME:

λ> data a +' b = Plus a b
<interactive>:3:9:
      Unexpected type ‘b’
      In the data declaration for ‘+’
      A data declaration should have form
        data + a b c = ...
λ> data a +′ b = Plus a b

λ> let a' = 1
λ> let a′ = 1
<interactive>:10:8: parse error on input ‘=’

Also some rather bizarre looking things are accepted:

λ> let ᵤxᵤy = 1

In the spirit of improving things little by little I would like to propose:

1. Handle single/double/triple/quadruple Unicode PRIMEs the same way as
APOSTROPHE, meaning the following alterations to the lexer:

primes -> U+2032 | U+2033 | U+2034 | U+2057
symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes)
graphic -> small | large | symbol | digit | special | " | ' | primes
varid -> (small { small | large | digit | ' | primes }) (EXCEPT reservedid)
conid -> large { small | large | digit | ' | primes }

2. Introduce a new lexer nonterminal "subsup" that would include the
Unicode sub/superscript[1] versions of numbers, "-", "+", "=", "(", ")",
Latin and Greek letters. And allow these characters to be used in names
and operators:

symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes |
subsup )
digit -> ascDigit | uniDigit (EXCEPT subsup)
small -> ascSmall | uniSmall (EXCEPT subsup) | _
large -> ascLarge | uniLarge (EXCEPT subsup)
graphic -> small | large | symbol | digit | special | " | ' | primes |
subsup
varid -> (small { small | large | digit | ' | primes | subsup }) (EXCEPT
reservedid)
conid -> large { small | large | digit | ' | primes | subsup }
varsym -> (symbol (EXCEPT :) {symbol | subsup}) (EXCEPT reservedop | dashes)
consym -> (: {symbol | subsup}) (EXCEPT reservedop)

If this proposal is received favorably, I'll write a patch for GHC based
on my previous stab at the problem[2].

P.S. I'm CC-ing Cafe for extra attention, but please keep the discussion
to the GHC users list.

[1] https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
[2] https://ghc.haskell.org/trac/ghc/ticket/5108
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

While personally I like the proposal (wanted prime and sub/sup scripts
way too many times), I worry what this means for compatibility reasons:
suddenly we'll have code that fails to build on 7.8 and before because
someone using 7.9/7.10+ used ′ somewhere. Even using CPP based on
version of the compiler used is not too great in this scenario because
it doesn't bring significant practical advantage to justify the CPP
clutter in code. If the choice is either extra lines due to CPP or using
‘'’ instead of ‘′’, I know which I'll go for.

Currently GHC categorizes Unicode PRIME as a "symbol", which means thatit is allowed to appear only in operators (varsym and consym). So yes,if somebody is using things like "+′" or ":+′" (and they reallyshouldn't), they would be hit by this change. Identifiers like "ᵤx"would become illegal too. I'd be surprised to find an actual Hackagelibrary that does that though.


I also worry (although not based on anything particular you said)
whether this will not change meaning of any existing programs. Does it
only allow new programs?

As far as I can see, no change in meaning. Some hacky operators and somehacky identifiers would become illegal. And some nicer ones would becomelegal.


Will it be enabled by a pragma?


No, GHC accepts Unicode input without any pragmas.


I simply worry about how practical it will be to use for actual programs
and libraries that will go out on Hackage and wider world, even if it is
accepted.

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: RFC: Unicode primes and super/subscript characters in GHC

Reply via email to