Use the:
identifier : KEY1 | KEY2 | ... | ID ; But when you have: r: KEY1 ... | identifier ; You just need a one token predicate and lists the keyword stuff first. r: (KEY!)=>KEY1 ... | identifier ; I have built entire SQL parsers using this approach and SQL is probably the king of ambiguity in the regard. Jim > -----Original Message----- > From: [email protected] [mailto:antlr-interest- > [email protected]] On Behalf Of Ron Hunter-Duvar > Sent: Monday, July 19, 2010 12:33 PM > To: Cid Dennis > Cc: [email protected] > Subject: Re: [antlr-interest] Lex Matching Issues > > > Cid Dennis wrote: > > So I am new to ANTLR and have created a grammar but found a strange > issue. Because of the structure of the language I am parsing there can > be tokens that match reserved works as variables but only when they are > in a sub rule that does not use the reserved word. > > > > ... > Hi Cid, > > I had to deal with a similar situation, and neither of the solutions > suggested in the wiki worked for me > (http://www.antlr.org/wiki/pages/viewpage.action?pageId=1741), and I > haven't seen any other solutions described in the email. After a lot of > experimentation I came up with a hybrid solution. > > One approach described is to define identifier like this: > > identifier : KEY1 | KEY2 | ... | ID ; > > The reason that this doesn't work is that where you have a production > that says "( KEY1 | identifier ) "Antlr reports a conflict and disables > the second alternative unless it can resolve the conflict with its > extended lookahead. So depending on which order you put them either > KEY1 > or identifier gets disabled, neither of which gives you a working > parser. The only way you would be able to make this work would be to > define a different identifier production for every different situation, > with the conflicting keywords removed. That might work for a handful of > different keywords, but doesn't scale. > > The other suggested solution is to create productions like this: > > keyIF : {input.LT(1).getText().equals("if")}? ID ; > keyCALL : {input.LT(1).getText().equals("call")}? ID ; > > This is even worse than the other solution. I don't think there is any > way to make this work at all. The problem is that the semantic > predicate > doesn't get applied when doing looking ahead in other productions, only > when actually trying to make the keyXXX productions. So everywhere you > have "( key1 | identifier )", to Antlr this looks like "( ID | ID )". > This completely throws the lookahead for a loop. In many cases it will > actually generate a parser without warnings, but the actual lookahead > it's doing is just wrong. For example, I had a case something like > "(keyA keyB LEFT_BRACKET) | (keyC keyD keyE)". To Antler's lookahead > analysis this said "(ID ID LEFT_BRACKET)|(ID ID ID)", so it generated > lookahead that basically said "if (LT(3) == LEFT_BRACKET) option1 else > option2". So it was taking other things like "C D (" and trying to > parse > them as "A B (", then of course throwing run time exceptions on the > failed semantic predicates. > > The hybrid approach I came up with was to define the keywords as > separate token types and define an identifier rule like in the first > approach, but then also create keyword productions similar to the > second > approach: > > keyIF: {input.LT(1).getText().equals("if")}? IF ; > > I know at first glance the semantic predicate seems pointless. But it > causes Antlr to ignore the conflicts that it would have reported > otherwise, resulting in a working grammar. > > There are just two caveats with this approach. First, it can also > suppress reporting of real conflicts, causing bugs in your parser. To > avoid this it's a good idea to first develop and test the grammar > without the semantic predicates in the keyword rules and without the > keywords as alternatives in the identifier rule. Once you have the > grammar working this way with no conflicts, then it's safe to put the > other rules that handle the non-reserved keywords. > > Second, all those semantic predicates can cause the size of the > generated parser to blow up. It can cause syntactic predicates to have > a > lot of special states and make the DFAs huge. This may only be a > problem > with the Java runtime, where the JVM imposes extreme size limits on > class size (64KB). With large grammars or complex lookahead situations > it can result in an uncompilable Java class being generated. > > Hope this helps, > > Ron > > -- > Ron Hunter-Duvar | Software Developer V | 403-272-6580 > Oracle Service Engineering > Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5 > > All opinions expressed here are mine, and do not necessarily represent > those of my employer. > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
