Note that matching in terms of UPPER case is generally a bad idea. There are 
languages with characters that do not appear at the start of words. As upper 
case has come to be primarily used to indicate the start of words in selective 
contexts, such characters need not have a proper mapping to upper case. The 
German ß is the best known such character in languages with latin based 
character sets, but it is not the only such example. However if a language has 
a notion of case, there is always a mapping to lower case and for simple case 
folding that is to be preferred.

In many ways the problem of dealing with case is similar to the problem of 
dealing with normalization, where the same character can be represented by more 
than one combination of code points. As part of its process of dealing with 
normalization, for programming languages the UNICODE consortium recommended a 
couple of straightforward means of dealing identifier uniqueness.These are 
covered in "Unicode Standard Annex #31, Unicode Identifier and Pattern Syntax"
http://www.unicode.org/reports/tr31/
These have a straightforward implementation in terms of the UNICODE character 
property tables, and it is a small matter of programming to implement their 
lexical classes for identifiers.

On Jun 6, 2011, at 4:56 PM, Jim Idle wrote:

> No, that is not correct, please look at the WIKI article. The input stream
> merely MATCHES in upper case, it does NOT change the input stream itself,
> hence both the keywords and anything else are case preserved when you ask
> for their text; that is the whole point of doing it that way. Then you
> specify the tokens in the lexer using upper case only and it has the side
> effect of simplifying the lexer rules as well as not creating a method
> call to match every letter of every keyword (which is a bad idea even with
> JIT inlining).
> 
> Jim
> 
>> -----Original Message-----
>> From: [email protected] [mailto:antlr-interest-
>> [email protected]] On Behalf Of Douglas Godfrey
>> Sent: Monday, June 06, 2011 12:41 PM
>> To: Marco Hunsicker
>> Cc: [email protected]
>> Subject: Re: [antlr-interest] New Guy Question...
>> 
>> When you implement case insensitive keywords, you may still want case
>> sensitive identifiers.
>> If the input stream does case folding, you can't use case sensitive
>> identifiers.
>> 
>> On Sun, Jun 5, 2011 at 5:58 PM, Marco Hunsicker <[email protected]>
>> wrote:
>> 
>>>> You have to handle case insensitivity the hard way:
>>>> 
>>>> fragment A
>>>>     :    'A' | 'a';
>>>> 
>>>> [...]
>>> 
>>> I don't think it's a necessity to do it this way. Actually, I think
>> it
>>> would be better using a specialized input stream that does any
>>> necessary transformation. Your mileage may vary ;)
>>> 
>>> Cheers,
>>> 
>>> Marco
>>> 
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe:
>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-
>> address
>>> 
>> 
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>> email-address
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

Reply via email to