I think I've found a problem with ANTLR's lexer generation. Here's a small  
example that shows the problem:

grammar Problem;

EOL    : '\n' ;
BAB    : 'BAB' ;

SYMBOL : 'A'|'B' ;

line
     : SYMBOL* EOL
     ;

script
     :
     ( line
         {
             System.out.print($line.text);
         }
     )*
     ;


My test input looks like this:

AAA
AAB
ABB
BAA


Lines 1-3 parse fine, but the lexer barfs on line 4. Here's the output:

line 4:2 mismatched character 'A' expecting 'B'
AAA
AAB
ABB


The prediction code in the lexer's mTokens method looks like this (with my  
comments):

         switch ( input.LA(1) ) {
         case '\n':
             {
             alt1=1; // EOL
             }
             break;
         case 'B':
             {
             int LA1_2 = input.LA(2);

             if ( (LA1_2=='A') ) {
                 alt1=2; // BAB
             }
             else {
                 alt1=3;} // SYMBOL
             }
             break;
         case 'A':
             {
             alt1=3; // SYMBOL
             }
             break;
         default:
             NoViableAltException nvae =
                 new NoViableAltException("", 1, 0, input);

             throw nvae;
         }


It looks like it's not looking far enough ahead to determine that what  
it's looking at is really a BAB. I've seen the same behaviour in 3.0 and  
3.1.

Is this a problem with ANTLR or my expectations?


Cheers,
Ned.

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org:8080/mailman/listinfo/antlr-dev

Reply via email to