[il-antlr-interest: 27262] Re: [antlr-interest] Lexer code not generated as expected?

Jim Idle Tue, 15 Dec 2009 09:25:23 -0800

Your rules are ambiguous so ANTLR is finding a \n but if followed by a space or 
a '+' then it is recognizing CUTLINE. The analysis only looks ahead 'enough' to 
start down the path (it is not a try to match in order system like flex.) You 
have to be more specific with the lexer here if you want that kind of behavior:


fragment NEWLINE : ;
CUTLINE
: '\n'
   (
       (' '* '+')=>' '* '+') { skip(); }
       | {$type = NEWLINE}
    )
;

Jim

> -----Original Message-----
> From: [email protected] [mailto:antlr-interest-
> [email protected]] On Behalf Of [email protected]
> Sent: Tuesday, December 15, 2009 7:11 AM
> To: [email protected]
> Subject: [antlr-interest] Lexer code not generated as expected?
> 
> Hello,
> 
> I have found out a strange problem using Antlr and I wonder if it is a
> bug or not.
> Here is part of my grammar:
> 
> WS
>     : ' ' {$channel=HIDDEN;}
>     ;
> 
> CUTLINE
>     : ('\n' ' '* '+') {$channel=HIDDEN;}
>     ;
> 
> NEWLINE
>     : '\n'
>     ;
> 
> and here is what antlr generates in the function mTokens:
> 
> static void
> mTokens(pAntlrTestbenchLexer ctx)
> {
>     {
>         //  antlr/AntlrTestbench.g:1:8: ( T__10 | WS | CUTLINE |
> NEWLINE | ID | INT )
> 
>         ANTLR3_UINT32 alt4;
> 
>         alt4=6;
> 
>         switch ( LA(1) )
>         {
> ...
>         case '\n':
>               {
>                       switch ( LA(2) )
>                       {
>                       case ' ':
>                       case '+':
>                               {
>                                       alt4=3; //CUTLINE
>                               }
>                           break;
> 
>                       default:
>                           alt4=4;}            //NEWLINE
> 
>               }
>             break;
> 
> ...
> 
> 
> It doesn't correspond to what I want because when the input of the
> lexer is "\n ", I would expect it to recognize the lexemes NEWLINE and
> WS, but with the code above it will try to recognize the lexeme CUTLINE
> and fail.
> Indeed, when a '\n' has been first recognized, the lexer should look
> ahead to find the first non ' ' character, and then if it is a '+'
> character, OK the correct alternative is the CUTLINE rule, if not then
> only in this case the correct alternative is the NEWLINE rule.
> 
> The workarounbd I have found is to change the grammar this way:
> 
> NEWLINE
>     : '\n' ' '*
>     ;
> 
> Then it is working as I want, but I find it strange having to resolve
> the ambiguity this way.
> So is the C code generated by antlr correct or is it a bug?
> 
> Thanks,
> Yann
> 
> ____________________________________________________
> 
> Venez faire le plein d’idées et remplir votre hotte de cadeaux sur
> http://evenementiel.voila.fr/Noel/
> 
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address




List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

--

You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 27262] Re: [antlr-interest] Lexer code not generated as expected?

Reply via email to