Thanks a lot, I understand better what the problem was. Your solution is working fine.
Yann > Message du 15/12/09 à 18h25 > De : "Jim Idle" <[email protected]> > A : "[email protected]" <[email protected]> > Copie à : > Objet : Re: [antlr-interest] Lexer code not generated as expected? > > Your rules are ambiguous so ANTLR is finding a \n but if followed by a space > or a '+' then it is recognizing CUTLINE. The analysis only looks ahead > 'enough' to start down the path (it is not a try to match in order system > like flex.) You have to be more specific with the lexer here if you want that > kind of behavior: > > fragment NEWLINE : ; > CUTLINE > : '\n' > ( > (' '* '+')=>' '* '+') { skip(); } > | {$type = NEWLINE} > ) > ; > > Jim > > > -----Original Message----- > > From: [email protected] [mailto:antlr-interest- > > [email protected]] On Behalf Of [email protected] > > Sent: Tuesday, December 15, 2009 7:11 AM > > To: [email protected] > > Subject: [antlr-interest] Lexer code not generated as expected? > > > > Hello, > > > > I have found out a strange problem using Antlr and I wonder if it is a > > bug or not. > > Here is part of my grammar: > > > > WS > > : ' ' {$channel=HIDDEN;} > > ; > > > > CUTLINE > > : ('\n' ' '* '+') {$channel=HIDDEN;} > > ; > > > > NEWLINE > > : '\n' > > ; > > > > and here is what antlr generates in the function mTokens: > > > > static void > > mTokens(pAntlrTestbenchLexer ctx) > > { > > { > > // antlr/AntlrTestbench.g:1:8: ( T__10 | WS | CUTLINE | > > NEWLINE | ID | INT ) > > > > ANTLR3_UINT32 alt4; > > > > alt4=6; > > > > switch ( LA(1) ) > > { > > ... > > case '\n': > > { > > switch ( LA(2) ) > > { > > case ' ': > > case '+': > > { > > alt4=3; //CUTLINE > > } > > break; > > > > default: > > alt4=4;} //NEWLINE > > > > } > > break; > > > > ... > > > > > > It doesn't correspond to what I want because when the input of the > > lexer is "\n ", I would expect it to recognize the lexemes NEWLINE and > > WS, but with the code above it will try to recognize the lexeme CUTLINE > > and fail. > > Indeed, when a '\n' has been first recognized, the lexer should look > > ahead to find the first non ' ' character, and then if it is a '+' > > character, OK the correct alternative is the CUTLINE rule, if not then > > only in this case the correct alternative is the NEWLINE rule. > > > > The workarounbd I have found is to change the grammar this way: > > > > NEWLINE > > : '\n' ' '* > > ; > > > > Then it is working as I want, but I find it strange having to resolve > > the ambiguity this way. > > So is the C code generated by antlr correct or is it a bug? > > > > Thanks, > > Yann > > > > ____________________________________________________ > > > > Venez faire le plein d’idées et remplir votre hotte de cadeaux sur > > http://evenementiel.voila.fr/Noel/ > > > > > > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > > email-address > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > ____________________________________________________ Venez faire le plein d’idées et remplir votre hotte de cadeaux sur http://evenementiel.voila.fr/Noel/ List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
