Greetings!
On Mon, 2010-07-19 at 09:52 -0600, Cid Dennis wrote:
> So I am new to ANTLR and have created a grammar but found a strange issue.
> Because of the structure of the language I am parsing there can be tokens
> that match reserved works as variables but only when they are in a sub rule
> that does not use the reserved word.
>
> In the example below "ruleset" is seen by the parser in two different ways.
> The first is for the 'ruleset' token and the second is as a VAR token. The
> problem is when the parser sees the second ruleset it is thinking the token
> is the "ruleset" token not the "VAR" token so it returns Mismatch token
> exception.
>
> How can I make it so that I can do this kind of parsing. One work around I
> came up with was to change 'ruleset' in the grammar to be a VAR but then it
> is not easy to see what the grammar looks like.
>
> In the end I do not care what the token is considered(VAR or 'ruleset') as
> long as the parser does the right thing and can parse the "assignment" if
> 'ruleset' is used on the left hand side of the assignment.
>
>
> Simple Example Input:
>
> ruleset joe {
> rule myrulename is active {
> ruleset = "test";
> }
> }
>
> Simple Grammer:
>
> grammar test;
> options {
> output=AST;
> }
>
> ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_')*
> ;
>
> COMMENT
> : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
> | '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
> ;
>
> WS : ( ' '
> | '\t'
> | '\r'
> | '\n'
> ) {$channel=HIDDEN;}
> ;
>
> STRING
> : '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
> ;
>
> fragment
> EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
>
> fragment
> HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
>
> fragment
> ESC_SEQ
> : '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
> | UNICODE_ESC
> | OCTAL_ESC
> ;
>
> fragment
> OCTAL_ESC
> : '\\' ('0'..'3') ('0'..'7') ('0'..'7')
> | '\\' ('0'..'7') ('0'..'7')
> | '\\' ('0'..'7')
> ;
>
> fragment
> UNICODE_ESC
> : '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
> ;
>
>
> ruleset :
> 'ruleset' ID '{' rule* '}'
> ;
>
> rule :
> 'rule' ID 'is' ('active'|'inactive'|'test') '{' assignment* '}'
> ;
>
>
> assignment :
> ID '=' STRING ';'
> ;
>
>
>
This is a fairly frequently asked question. Please try to search the
mail archives and/or the wiki at antlr.org.
One of the usual solutions, I believe, is to create a parser rule that
accepts your ID along with the keywords that are appropriate. So your
assignment rule would become something like (untested):
assignment : lhs '=' STRING ';' ;
lhs : ID | 'ruleset' /* other keyword alternatives go here */ ;
a down-side to this approach is that one has to be very careful to not
introduce ambiguities. probably by having a different parser rule for
each context - can get large and ugly...
another solution is to not have any keywords in the lexer but use parser
predicates to identify the keywords. I do not usually use predicates, so
I do not remember the specific meta-syntax, but it would be something
like:
ruleset : {$LA(1).text == "ruleset"}?=>ID ;
// and replace all 'ruleset' to refer to the ruleset rule instead
hope this helps...
-jbb
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
--
You received this message because you are subscribed to the Google Groups
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/il-antlr-interest?hl=en.