[il-antlr-interest: 0] Re: [antlr-interest] Lex Matching Issues

John B. Brodie Mon, 19 Jul 2010 09:20:26 -0700

Greetings!
On Mon, 2010-07-19 at 09:52 -0600, Cid Dennis wrote:
> So I am new to ANTLR and have created a grammar but found a strange issue.  
> Because of the structure of the language I am parsing there can be tokens 
> that match reserved works as variables but only when they are in a sub rule 
> that does not use the reserved word.
> 
> In the example below "ruleset" is seen by the parser in two different ways.  
> The first is for the 'ruleset' token and the second is as a VAR token.  The 
> problem is when the parser sees the second ruleset it is thinking the token 
> is the "ruleset" token not the "VAR" token so it returns Mismatch token 
> exception.  
> 
> How can I make it so that I can do this kind of parsing.   One work around I 
> came up with was to change 'ruleset' in the grammar to be a VAR  but then it 
> is not easy to see what the grammar looks like.  
> 
> In the end I do not care what the token is considered(VAR or 'ruleset') as 
> long as the parser does the right thing and can parse the "assignment" if 
> 'ruleset' is used on the left hand side of the assignment.   
> 
> 
> Simple Example Input:
> 
> ruleset joe {
>       rule myrulename is active {
>               ruleset = "test";
>       }       
> }
> 
> Simple Grammer:
> 
> grammar test;
> options {
>   output=AST;
> }
> 
> ID  : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_')*
>     ;
> 
> COMMENT
>     :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
>     |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
>     ;
> 
> WS  :   ( ' '
>         | '\t'
>         | '\r'
>         | '\n'
>         ) {$channel=HIDDEN;}
>     ;
> 
> STRING
>     :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
>     ;
> 
> fragment
> EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
> 
> fragment
> HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
> 
> fragment
> ESC_SEQ
>     :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
>     |   UNICODE_ESC
>     |   OCTAL_ESC
>     ;
> 
> fragment
> OCTAL_ESC
>     :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
>     |   '\\' ('0'..'7') ('0'..'7')
>     |   '\\' ('0'..'7')
>     ;
> 
> fragment
> UNICODE_ESC
>     :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
>     ;
>     
>     
> ruleset :     
>       'ruleset' ID '{' rule* '}'
>       ;
>       
> rule  :
>       'rule' ID 'is' ('active'|'inactive'|'test') '{' assignment* '}'
>       ;
> 
> 
> assignment :  
>       ID '=' STRING ';'
>       ;
> 
>       
>


This is a fairly frequently asked question. Please try to search the
mail archives and/or the wiki at antlr.org.

One of the usual solutions, I believe, is to create a parser rule that
accepts your ID along with the keywords that are appropriate. So your
assignment rule would become something like (untested):

assignment : lhs '=' STRING ';' ;
lhs : ID | 'ruleset' /* other keyword alternatives go here */ ;

a down-side to this approach is that one has to be very careful to not
introduce ambiguities. probably by having a different parser rule for
each context - can get large and ugly...

another solution is to not have any keywords in the lexer but use parser
predicates to identify the keywords. I do not usually use predicates, so
I do not remember the specific meta-syntax, but it would be something
like:

ruleset : {$LA(1).text == "ruleset"}?=>ID ;
// and replace all 'ruleset' to refer to the ruleset rule instead


hope this helps...
   -jbb



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 0] Re: [antlr-interest] Lex Matching Issues

Reply via email to