I think I've found a problem with ANTLR's lexer generation. Here's a small
example that shows the problem:
grammar Problem;
EOL : '\n' ;
BAB : 'BAB' ;
SYMBOL : 'A'|'B' ;
line
: SYMBOL* EOL
;
script
:
( line
{
System.out.print($line.text);
}
)*
;
My test input looks like this:
AAA
AAB
ABB
BAA
Lines 1-3 parse fine, but the lexer barfs on line 4. Here's the output:
line 4:2 mismatched character 'A' expecting 'B'
AAA
AAB
ABB
The prediction code in the lexer's mTokens method looks like this (with my
comments):
switch ( input.LA(1) ) {
case '\n':
{
alt1=1; // EOL
}
break;
case 'B':
{
int LA1_2 = input.LA(2);
if ( (LA1_2=='A') ) {
alt1=2; // BAB
}
else {
alt1=3;} // SYMBOL
}
break;
case 'A':
{
alt1=3; // SYMBOL
}
break;
default:
NoViableAltException nvae =
new NoViableAltException("", 1, 0, input);
throw nvae;
}
It looks like it's not looking far enough ahead to determine that what
it's looking at is really a BAB. I've seen the same behaviour in 3.0 and
3.1.
Is this a problem with ANTLR or my expectations?
Cheers,
Ned.
_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org:8080/mailman/listinfo/antlr-dev