Cid Dennis wrote: > So I am new to ANTLR and have created a grammar but found a strange issue. > Because of the structure of the language I am parsing there can be tokens > that match reserved works as variables but only when they are in a sub rule > that does not use the reserved word. > > ... Hi Cid,
I had to deal with a similar situation, and neither of the solutions suggested in the wiki worked for me (http://www.antlr.org/wiki/pages/viewpage.action?pageId=1741), and I haven't seen any other solutions described in the email. After a lot of experimentation I came up with a hybrid solution. One approach described is to define identifier like this: identifier : KEY1 | KEY2 | ... | ID ; The reason that this doesn't work is that where you have a production that says "( KEY1 | identifier ) "Antlr reports a conflict and disables the second alternative unless it can resolve the conflict with its extended lookahead. So depending on which order you put them either KEY1 or identifier gets disabled, neither of which gives you a working parser. The only way you would be able to make this work would be to define a different identifier production for every different situation, with the conflicting keywords removed. That might work for a handful of different keywords, but doesn't scale. The other suggested solution is to create productions like this: keyIF : {input.LT(1).getText().equals("if")}? ID ; keyCALL : {input.LT(1).getText().equals("call")}? ID ; This is even worse than the other solution. I don't think there is any way to make this work at all. The problem is that the semantic predicate doesn't get applied when doing looking ahead in other productions, only when actually trying to make the keyXXX productions. So everywhere you have "( key1 | identifier )", to Antlr this looks like "( ID | ID )". This completely throws the lookahead for a loop. In many cases it will actually generate a parser without warnings, but the actual lookahead it's doing is just wrong. For example, I had a case something like "(keyA keyB LEFT_BRACKET) | (keyC keyD keyE)". To Antler's lookahead analysis this said "(ID ID LEFT_BRACKET)|(ID ID ID)", so it generated lookahead that basically said "if (LT(3) == LEFT_BRACKET) option1 else option2". So it was taking other things like "C D (" and trying to parse them as "A B (", then of course throwing run time exceptions on the failed semantic predicates. The hybrid approach I came up with was to define the keywords as separate token types and define an identifier rule like in the first approach, but then also create keyword productions similar to the second approach: keyIF: {input.LT(1).getText().equals("if")}? IF ; I know at first glance the semantic predicate seems pointless. But it causes Antlr to ignore the conflicts that it would have reported otherwise, resulting in a working grammar. There are just two caveats with this approach. First, it can also suppress reporting of real conflicts, causing bugs in your parser. To avoid this it's a good idea to first develop and test the grammar without the semantic predicates in the keyword rules and without the keywords as alternatives in the identifier rule. Once you have the grammar working this way with no conflicts, then it's safe to put the other rules that handle the non-reserved keywords. Second, all those semantic predicates can cause the size of the generated parser to blow up. It can cause syntactic predicates to have a lot of special states and make the DFAs huge. This may only be a problem with the Java runtime, where the JVM imposes extreme size limits on class size (64KB). With large grammars or complex lookahead situations it can result in an uncompilable Java class being generated. Hope this helps, Ron -- Ron Hunter-Duvar | Software Developer V | 403-272-6580 Oracle Service Engineering Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5 All opinions expressed here are mine, and do not necessarily represent those of my employer. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
