[il-antlr-interest: 29955] Re: [antlr-interest] Why does ANTLR generate code that will never call an OR'd alternative?

Kevin J. Cummings Sat, 21 Aug 2010 07:24:54 -0700

On 08/21/2010 04:00 AM, Avid Trober wrote:
> Kevin,
> 
> Thanks for taking the time to reply.  
> 
> I did have the predicate in the identifier rule, but it appears the wrong
> way:
> 
>       identifier 
>       :        {isToken(input.LT(1))}?  IDENTIFIER  | IDENTIFIER;


Why can't you something like do:

identifier: i:IDENTIFIER
        { if (isToken($i))
            { // code here for the isToken case
            }
          else
            { // code here (maybe empty) for the other case
            }
        }
        ;

> The above still produced code that would never call isToken.  The reason I
> did it like above, I thought the predicate had to change the token type
> (from the tokens section value to IDENTIFIER); therefore, the IDENTIFIER
> after the predicate.
> 
> Per your email, I tried this:
> 
>       identifier 
>       :        {isToken(input.LT(1))}?  | IDENTIFIER;

This case won't match anything, so in order for isToken to be called,
the lookahead would have to *not* be an IDENTIFIER.

> And, ANTLR generated code that would call isToken.  But, isToken could also
> be called on the right side of the OR in the 'identifier' rule (see code
> below).
> But, worse:
> 
> 1. The identifier rule doesn't work in the above form.  I get unexpected
> token exceptions for using a tokens section token as what's meant to be
> non-grammar keywords.
> 
> 2. Check out this first "if" for a simple list of tokens...some checks are
> for the value of the token (e.g. TOKEN1, TOKEN10) and others are for values
> range checks (e.g. (LA30_0 >= TOKEN2 && LA30_0 <= TOKEN3).  The latter I
> could understand, if it weren't for the fact TOKEN2 and TOKEN3 values are 5
> and 6!  
> 
> 
>             if ( (LA30_0 == TOKEN1 || (LA30_0 >= TOKEN2 && LA30_0 <= TOKEN3)
> || (LA30_0 >= TOKEN4 && LA30_0 <= TOKEN5) || (LA30_0 >= TOKEN6 && LA30_0 <=
> TOKEN7) || (LA30_0 >= TOKEN8 && LA30_0 <= TOKEN9) || LA30_0 == TOKEN10 ||
> LA30_0 == TOKEN11 || (LA30_0 >= TOKEN12 && LA30_0 <= TOKEN13)) )
>             {
>                 alt30 = 1;
>             }
>             else if ( (LA30_0 == IDENTIFIER) )
>             {
>                 int LA30_2 = input.LA(2);
> 
>                 if ( ((isToken(input.LT(1)))) )
>                 {
>                     alt30 = 1;
>                 }
>                 else if ( (true) )
>                 {
>                     alt30 = 2;
>                 }
>                 else 
>                 {
>                     NoViableAltException nvae_d30s2 =
>                         new NoViableAltException("", 30, 2, input);
> 
>                     throw nvae_d30s2;
>                 }
>             }
>             else 
>             {
>                 NoViableAltException nvae_d30s0 =
>                     new NoViableAltException("", 30, 0, input);
> 
>                 throw nvae_d30s0;
>             }
>             switch (alt30) 
>             {
>                 case 1 :
>                     // ... : {...}?
>                     {
>                       root_0 = (object)adaptor.GetNilNode();
> 
>                       if ( !((isToken(input.LT(1)))) ) 
>                       {
>                           throw new FailedPredicateException(input,
> "identifier", "isToken(input.LT(1))");
>                       }
> 
>                     }
>                     break;
>                 case 2 :
>                     // ... : IDENTIFIER
>                     {
>                       root_0 = (object)adaptor.GetNilNode();
> 
>  
> IDENTIFIER132=(IToken)Match(input,IDENTIFIER,FOLLOW_IDENTIFIER_in_identifier
> 1562); 
>                               IDENTIFIER132_tree =
> (object)adaptor.Create(IDENTIFIER132);
>                               adaptor.AddChild(root_0,
> IDENTIFIER132_tree);
> 
> 
>                     }
>                     break;
> 
>             }
> 
> 
> The only form of the 'identifier' rule I got to work was this:
> 
>       identifier 
>       :       
>         (      'TOKEN1' 
>         |      'TOKEN2'       
>         |      'TOKEN3'
>               ...
>         |      'TOKEN_ZILLION')   { input.LT(-1).Type = IDENTIFIER; } 
>         |       IDENTIFIER;
> 
> 
> Now, I can use a tokens keyword in a way the parser won't throw an
> exception:
> 
>       TOKEN1=TOKEN3
> 
>       And, 'TOKEN3' doesn't trip up the parser.
> (For the above, the rule is:
> 
>       TOKEN1=identifier
> 
> Which never worked before if the right-side of the equal sign was a token in
> the tokens section).

In cases like this, I have done:

keyword : 'TOKEN1'
        | 'TOKEN2'
        | 'TOKEN3'
          ...
        | 'LAST_TOKEN'
        ;

identifier : IDENTIFIER
           | k:keyword
             { #k->setType(IDENITIFER); }
           ;

(OK, this is with ANTLR 2.7.7 and the C++ target...)  but it should be
similar with ANTLR 3.

> I don't like my solution, listing the tokens twice in the grammar file.
> And, would love to know how a pro would solve it.  Initially,  if I
> should/must taken all the tokens out of the tokens section and, perhaps,
> make per-token rules for them???   

I wouldn't use a semantic predicate for this, rather, I'd just clobber
the token type when I knew it was an identifier and not a keyword.

This question comes up rather often on this list.

> Regards,
> Trober

-- 
Kevin J. Cummings
[email protected]
[email protected]
[email protected]
Registered Linux User #1232 (http://counter.li.org)

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 29955] Re: [antlr-interest] Why does ANTLR generate code that will never call an OR'd alternative?

Reply via email to