The article describes a general method, not a universal solution. If you have a language where the such semantics apply, you will need a specific solution. In general these semantics are ignored for programming languages though, so this is somewhat pedantic.
Jim > -----Original Message----- > From: [email protected] [mailto:antlr-interest- > [email protected]] On Behalf Of William Clodius > Sent: Wednesday, June 08, 2011 10:44 PM > To: antlr-interest interest > Subject: Re: [antlr-interest] New Guy Question... > > Note that matching in terms of UPPER case is generally a bad idea. > There are languages with characters that do not appear at the start of > words. As upper case has come to be primarily used to indicate the > start of words in selective contexts, such characters need not have a > proper mapping to upper case. The German ß is the best known such > character in languages with latin based character sets, but it is not > the only such example. However if a language has a notion of case, > there is always a mapping to lower case and for simple case folding > that is to be preferred. > > In many ways the problem of dealing with case is similar to the problem > of dealing with normalization, where the same character can be > represented by more than one combination of code points. As part of its > process of dealing with normalization, for programming languages the > UNICODE consortium recommended a couple of straightforward means of > dealing identifier uniqueness.These are covered in "Unicode Standard > Annex #31, Unicode Identifier and Pattern Syntax" > http://www.unicode.org/reports/tr31/ > These have a straightforward implementation in terms of the UNICODE > character property tables, and it is a small matter of programming to > implement their lexical classes for identifiers. > > On Jun 6, 2011, at 4:56 PM, Jim Idle wrote: > > > No, that is not correct, please look at the WIKI article. The input > > stream merely MATCHES in upper case, it does NOT change the input > > stream itself, hence both the keywords and anything else are case > > preserved when you ask for their text; that is the whole point of > > doing it that way. Then you specify the tokens in the lexer using > > upper case only and it has the side effect of simplifying the lexer > > rules as well as not creating a method call to match every letter of > > every keyword (which is a bad idea even with JIT inlining). > > > > Jim > > > >> -----Original Message----- > >> From: [email protected] [mailto:antlr-interest- > >> [email protected]] On Behalf Of Douglas Godfrey > >> Sent: Monday, June 06, 2011 12:41 PM > >> To: Marco Hunsicker > >> Cc: [email protected] > >> Subject: Re: [antlr-interest] New Guy Question... > >> > >> When you implement case insensitive keywords, you may still want > case > >> sensitive identifiers. > >> If the input stream does case folding, you can't use case sensitive > >> identifiers. > >> > >> On Sun, Jun 5, 2011 at 5:58 PM, Marco Hunsicker <[email protected]> > >> wrote: > >> > >>>> You have to handle case insensitivity the hard way: > >>>> > >>>> fragment A > >>>> : 'A' | 'a'; > >>>> > >>>> [...] > >>> > >>> I don't think it's a necessity to do it this way. Actually, I think > >> it > >>> would be better using a specialized input stream that does any > >>> necessary transformation. Your mileage may vary ;) > >>> > >>> Cheers, > >>> > >>> Marco > >>> > >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest > >>> Unsubscribe: > >>> http://www.antlr.org/mailman/options/antlr-interest/your-email- > >> address > >>> > >> > >> List: http://www.antlr.org/mailman/listinfo/antlr-interest > >> Unsubscribe: > >> http://www.antlr.org/mailman/options/antlr-interest/your- > >> email-address > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > > Unsubscribe: > > http://www.antlr.org/mailman/options/antlr-interest/your-email- > address > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
