Hi,

I was recenly looking at this code to see how the lexer decides that a
character is a letter, space, etc. The problem is that with Unicode
there are hundreds of thousands of characters that are declared to be
alphanumeric. Even if they are compressed into a regular expression
with a list of ranges there will be still ~390 ranges. The GHC lexer
avoids hardcoding this ranges by calling isSpace, isAlpha, etc and
then converting this result to a code. Ideally it would be nice if
Alex had a predefined macroses corresponding to the Unicode
categories, but for now you have to either hard code the ranges with
huge regular expressions or use the workaround used in GHC. Is there
any other solution?

Regards,
  Krasimir


2014/1/7 Carter Schonwald <[email protected]>:
> you're probably right, this could be regarded as dead code for ghc 7.8 (esp
> since alex and happy must be the recent versions to even build ghc HEAD ! )
>
>
> On Tue, Jan 7, 2014 at 2:25 AM, Mateusz Kowalczyk <[email protected]>
> wrote:
>>
>> Greetings,
>>
>> When looking at the GHC lexer (Lexer.x), there's:
>>
>> > $unispace    = \x05 -- Trick Alex into handling Unicode. See
>> > alexGetChar.
>> > $whitechar   = [\ \n\r\f\v $unispace]
>> > $white_no_nl = $whitechar # \n
>> > $tab         = \t
>>
>> Scrolling down to alexGetChar and alexGetChar', we see the comments:
>>
>>
>> > -- backwards compatibility for Alex 2.x
>> > alexGetChar :: AlexInput -> Maybe (Char,AlexInput)
>> >
>> > -- This version does not squash unicode characters, it is used when
>> > -- lexing strings.
>> > alexGetChar' :: AlexInput -> Maybe (Char,AlexInput)
>>
>> What's the reason for these? I was under the impression that since
>> 3.0, Alex has natively supported unicode. Is it just dead code? Could
>> all the hex $uni* functions be removed? If not, why not?
>>
>> --
>> Mateusz K.
>> _______________________________________________
>> ghc-devs mailing list
>> [email protected]
>> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
>
> _______________________________________________
> ghc-devs mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
_______________________________________________
ghc-devs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/ghc-devs

Reply via email to