Thank you, Gavin, for taking the time to reply. >Am I supposed to write an initialization routine that builds a dictionary? So, this is what I have to do.
In my CSharp2 target, there *already* is both components necessary for this dicationary; string values of the tokens and the corresponding integer token type. It appears I have to duplicate some of that to make a dictionary, which is OK, but surprising since ANTLR doc/publication stresses efficiency. i.e. it seems the target could've reorg'd it in such a way as to provide this vs. requiring manual duplication of it. Just thinking out loud, not complaining...overall, I'm loving ANTLR. :-) Regards, Ben ----- Original Message ----- From: "Gavin Lambert" <[EMAIL PROTECTED]> To: "Ben Gillis" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Friday, October 31, 2008 9:52 PM Subject: Re: [antlr-interest] QUESTION on: How do I handle abbreviated keywords? > At 14:00 1/11/2008, Ben Gillis wrote: >>see http://www.antlr.org/wiki/pages/viewpage.action?pageId=1802308. >> >>It's not clear to me the connection between the tokens block (and its >>auto-gen'd code), and this statement in the above URL: >> >>"might simply consult an IDictionary<string,int> map of all keywords (incl >>abbreviations). " >> >>The tokens block ends up in a string array named tokenNames (CSharp2 >>target). My tokens keywords are mixed with other entries related to the >>grammar definition. >> >>Am I supposed to write an initialization routine that builds a dictionary? >>If so, I have to filter through the auto-gen'd tokenNames making sure to >>enter only my keywords, otherwise I'll get false hits in my >>CheckKeywordsTable routine. Somehow, this doesn't seem quite right, ??? > > The tokenNames array is a list of token *names*, which is useless for that > purpose, since for that particular keyword matching strategy what you're > after is a mapping of keyword *text* to token *value*, which is an > entirely different thing. > > Say you have the keywords "begin", "end", and "while". Your tokens block > declares imaginary token types like this: > > tokens { > BEGIN; > END; > WHILE; > } > > These carry no text and can't do any matching by themselves, but they *do* > allocate a token ID for them. In your lexer's constructor, you > additionally set up a dictionary mapping like so: > > keywordTable.Add("begin", BEGIN); > keywordTable.Add("end", END); > keywordTable.Add("while", WHILE); > > Then in the CheckKeywordsTable function you use that mapping to return the > "real" token type, be that one listed in the table or the catch-all > IDENTIFIER (when it doesn't look like a keyword). For more complicated > cases you may need to do something smarter than a dictionary lookup, but > that's up to you. > > (To handle abbreviations, which is what that page is primarily focused on, > then obviously you'll have to add the valid abbreviations of the keywords > to the table as well.) > List: http://www.antlr.org:8080/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org:8080/mailman/options/antlr-interest/your-email-address --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en -~----------~----~----~----~------~----~------~--~---
