On 07/01/14 14:38, Simon Marlow wrote: > Krasimir is right, it would be hard to use Alex's built-in Unicode > support because we have to automatically generate the character classes > from the Unicode spec somehow. Probably Alex ought to include these as > built-in macros, but right now it doesn't. > > Even if we did have access to the right regular expressions, I'm > slightly concerned that the generated state machine might be enormous. > > Cheers, > Simon > > On 07/01/2014 08:26, Krasimir Angelov wrote: >> Hi, >> >> I was recenly looking at this code to see how the lexer decides that a >> character is a letter, space, etc. The problem is that with Unicode >> there are hundreds of thousands of characters that are declared to be >> alphanumeric. Even if they are compressed into a regular expression >> with a list of ranges there will be still ~390 ranges. The GHC lexer >> avoids hardcoding this ranges by calling isSpace, isAlpha, etc and >> then converting this result to a code. Ideally it would be nice if >> Alex had a predefined macroses corresponding to the Unicode >> categories, but for now you have to either hard code the ranges with >> huge regular expressions or use the workaround used in GHC. Is there >> any other solution? >> >> Regards, >> Krasimir >> >>
Ah, I think I understand now. If this is the case, at least the ‘alexGetChar’ could be removed, right? Is Alex 2.x compatibility necessary for any reason whatsoever? -- Mateusz K. _______________________________________________ ghc-devs mailing list [email protected] http://www.haskell.org/mailman/listinfo/ghc-devs
