>>>>> "Manuel M. T. Chakravarty" <[EMAIL PROTECTED]> (MMTC) writes:
MMTC> [EMAIL PROTECTED] (Marcin 'Qrczak' Kowalczyk) wrote,
>> Wed, 27 Sep 2000 00:22:05 +1100, Manuel M. T. Chakravarty <[EMAIL PROTECTED]>
>pisze:
>>
>> > Hmm, this seems like a shortcoming in the Haskell spec. We have all
>> > these isAlpha, isDigit, etc functions, but I can't get at a list of,
>> > say, all characters for which isAlpha is true.
>>
>> You can: filter isAlpha ['\0'..'\xFFFF']
>> (don't use maxBound here because it's too large and we know that
>> currently there are no isAlpha characters outside this range).
>>
>> Working on large explicit lists is inefficient. 45443 characters
>> are isAlpha. A lexer should be designed to avoid using a full list.
MMTC> You are right, just having a list of the characters is to
MMTC> naive an approach. But this re-enforces may point, we need
MMTC> an _efficient_ way of getting at the unicode ranges for
MMTC> certain character classes. H98 is seems to be lacking some
MMTC> features for practical use of unicode - the header to the
MMTC> standard library `Char' actually admits that
Doaitse Swierstra's [This is the correct spelling!] parser combinators in
their newest incarnation have symbol ranges as their basis. Internally they
are also used to allow binary search which is the primary reason for their
speed. There are now also facilities for writing scanners using these
combinators. With the ranges parsing Unicode shouldn't be less efficient
than parsing ASCII.
--
Piet van Oostrum <[EMAIL PROTECTED]>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: [EMAIL PROTECTED]