[il-antlr-interest: 29953] Re: [antlr-interest] Why does ANTLR generate code that will never call an OR'd alternative?

Kevin J. Cummings Sat, 21 Aug 2010 00:42:18 -0700

On 08/21/2010 03:27 AM, Avid Trober wrote:
> Gerald,
> 
> Thank you very much for your reply.
> 
> There's no alt skipped message in the error log.
> 
> The 'isToken' rule was simply my attempt to have the parser check if the
> token was in the tokens { ... } section.  At runtime, I found the token type
> to always be the value in the token { ... } section, even if I tried to
> change it:
> 
>       isToken :       {isToken(input.LT(1))}? IDENTIFIER;
> 
> But, 'isToken' would never get called via the generated code, e.g. 
> 
>       identifier  :  isToken | IDENTIFIER;   // i.e. treat a token in the
> tokens section as an IDENTIFIER.


You need to move your semantic predicate.  The lookahead sees that
IDENTIFIER is the lookahead for both.  If you want it to go through
isToken, your need to move the semantic predicate to the "identifier" rule.

> Therefore, I modified my 'identifier' rule to have each tokens { ... } value
> in it, e.g.
> 
>       identifier:
>               ( 'TOKEN1', 'TOKEN2', ... 'TOKEN_ELEVENTYTEEN_THOUSAND' }  {
> input.LT(-1).Type = IDENTIFIER; }
>               | IDENTIFIER;
> 
> And,  that worked.  That is, if I have "identifier" in the grammar somewhere
> it will now accept an IDENTIFIER, as it always has, but also any 'TOKEN1',
> 'TOKEN2', etc. value found in tokens { ... }
> 
> Personally, I hate this.  It means I need *two* places in my grammar to list
> the keywords, the tokens { ... } section AND the identifier rule.  I'm sure
> there's some way to do it via an action, predicate, whatever.  
> 
> I went down this path due to this recommendation: " The author's
> recommendation is to use ordinary rules and the tokens command." at
> http://www.antlr.org/wiki/display/ANTLR3/Quick+Starter+on+Parser+Grammars+-+
> No+Past+Experience+Required. 
> 
> It appears the tokens section is NOT the thing to do, perhaps rather to have
> per-token rules, e.g. keyToken1, keyToken2, etc.  But, I can't rewrite this
> grammar and risk breaking other things.  Perhaps I should in the future.
> Preferably, I simply like a way to scan thru the tokens, if found, note it,
> then change the token type to IDENTIFIER - without listing all the tokens
> twice in the grammar.
> 
> Any suggestions very, very welcome. 
> 
> Regards,
> Trober
> 
> 
> 
> 
> -----Original Message-----
> From: Gerald Rosenberg [mailto:[email protected]] 
> Sent: Saturday, August 21, 2010 1:35 AM
> To: Avid Trober
> Cc: [email protected]
> Subject: Re: [antlr-interest] Why does ANTLR generate code that will never
> call an OR'd alternative?
> 
>   Most likely, the parser generation analysis determined that isToken 
> can never be reached.  Check your error log for an alt skipped message.
> 
> 
> 
> ------ Original Message (Saturday, August 21, 2010 1:01:20 
> AM) From: Avid Trober ------
> Subject: [antlr-interest] Why does ANTLR generate code that will never call
> an OR'd alternative?
>> For this rule,
>>
>>
>>
>> identifier
>>
>>                  :       isToken | IDENTIFIER;
>>
>>
>>
>> ANTLR generates code that would never calls the isToken rule
>> (target=CSharp2):
>>
>>
>>
>>      public MYParser.identifier_return identifier()    // throws
>> RecognitionException [1]
>>
>>      {
>>
>> .
>>
>>              // .  : ( isToken | IDENTIFIER )
>>
>>              int alt30 = 2;
>>
>>              int LA30_0 = input.LA(1);
>>
>>
>>
>>              if ( (LA30_0 == IDENTIFIER) )   //<== token must be
> IDENTIFIER
>> to call isToken???
>>
>>              {
>>
>>                  int LA30_1 = input.LA(2);
>>
>>
>>
>>                  if ( ((isToken(input.LT(1)))) )  //<== why must LA30_0 ==
>> IDENTIFIER to call isToken?
>>
>>                  {
>>
>>                      alt30 = 1;
>>
>>                  }
>>
>>                  else if ( (true) )
>>
>>                  {
>>
>>                      alt30 = 2;
>>
>>                  }
>>
>> .
>>
>>              else                         //<== since not IDENTIFIER, why
>> not call isToken here???
>>
>>              {
>>
>>                  NoViableAltException nvae_d30s0 =
>>
>>                      new NoViableAltException("", 30, 0, input);
>>
>>
>>
>>                  throw nvae_d30s0;
>>
>>              }
>>
>>
>>
>> I would think it's something to do with DFA optimization?   Perhaps that's
>> why IDENTIFIER is checked first.
>>
>> But, if IDENTIFIER is false, why not call isToken???    Afterall, the rule
>> is IDENTIFIER  ****OR***** isToken.
>>
>>
>>
>> Thanks,
>>
>> Trober
>>
>>
>>
>>
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
> 
> 


-- 
Kevin J. Cummings
[email protected]
[email protected]
[email protected]
Registered Linux User #1232 (http://counter.li.org)

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 29953] Re: [antlr-interest] Why does ANTLR generate code that will never call an OR'd alternative?

Reply via email to