[il-antlr-interest: 28074] [antlr-interest] Literals and subrules

Kenneth Domino Thu, 25 Feb 2010 02:23:02 -0800

Hi All,

I'm not sure I understand why the following grammars, which I thought should 
recognize the same language, do not all work.  The differences are in the use 
of literals in the parser rules versus literals in the lexical analyzer rules, 
and in the use of parentheses for sub-rules.  The language is very simple: just 
a single letter followed by the end of file.  Can someone explain why some work 
and others do not?


1) This grammar places the literals in the parser rules.  Antlr generates a 
parser (exits with 0), but the parser does not compile.

  $ cat Doit1.g
  grammar Doit1;

  prog:
          id
          EOF
      ;

  id:
          'a' .. 'z'
          | 'A' .. 'Z'
      ;
  $ java org.antlr.Tool Doit1.g

  $ javac Doit1*.java
  Doit1Parser.java:71: illegal start of expression
              if (  ) {
                    ^
  1 error



2) The second grammar Doit2 places parentheses around the literal ranges 
because I eventually want to recognize something more than one character using 
the '+' sub-rule.  Using parentheses is supposed to be a "Subrule. Like a call 
to a rule with no name." according to the documentation 
(http://www.antlr.org/wiki/display/ANTLR3/Grammars), so it should be legal.  
Unfortunately, this grammar causes Antlr to generate an error message regarding 
EOF.  I don't understand why a sub-rule used here does not work.

  $ cat Doit2.g
  grammar Doit2;

  prog:
          id
          EOF
      ;

  id:
          ( 'a' .. 'z' )
          | ( 'A' .. 'Z' )
      ;

  $ java org.antlr.Tool Doit2.g
  warning(200): Doit2.g:8:3: Decision can match input such as "EOF" using 
multiple
   alternatives: 1, 2
  As a result, alternative(s) 2 were disabled for that input
  error(201): Doit2.g:8:3: The following alternatives can never be matched: 2


3) This grammar, Doit3, places the literals in the lexer rules.  Antlr produces 
a parser and lexer that compile and the recognizer accepts the language.

  $ cat Doit3.g
  grammar Doit3;

  prog:
          ID
          EOF
      ;

  ID:
          'a' .. 'z'
          | 'A' .. 'Z'
      ;

  $ java org.antlr.Tool Doit3.g

  $ javac Doit3*.java

  $ java Doit3 < i

  $


4) This grammar, Doit4, is almost the same as Doit3, but uses parentheses for 
sub-rules.  This grammar works, but I'm not sure why because this seems 
inconsistent in light that grammar Doit2 does not.

  $ cat Doit4.g
  grammar Doit4;

  prog:
          ID
          EOF
      ;

  ID:
          ( 'a' .. 'z' )
          | ( 'A' .. 'Z' )
      ;

  $ java org.antlr.Tool Doit4.g

  $ javac Doit4*.java

  $ java Doit4 < i

  $
Usually, I would simply move all literals to the lexical analyzer (I.e., use 
literals only in lexer rules) because this is how traditionally parsers and 
lexers were done.  But, I often see grammars for Antlr that have literals 
sprinkled through out both parser and lexer rules, so I thought I would give it 
a try.

Ken

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 28074] [antlr-interest] Literals and subrules

Reply via email to