Ah. I thought they did first match, not longest... On Sat, May 7, 2011 at 12:12 PM, John B. Brodie <[email protected]> wrote: > Greetings! > > On Sat, 2011-05-07 at 10:06 -0400, Todd O'Bryan wrote: >> Can anyone explain to me why tabs, spaces, and greater-thans at the >> beginning of lines are ending up in TEXT tokens, rather than in INDENT >> or QUOTE tokens, as I think they should? >> >> fragment SPECIAL_CHARS >> : ('\n' | '[' | ']' | '*' | '/' |'=' | '^' | '_' | '8' | '@' | '#' | >> '$' | '!' | '(' | ')' | '{' | '}' ); >> >> INDENT : { getCharPositionInLine() == 0 }?=> (' '|'\t')+; >> QUOTE : { getCharPositionInLine() == 0 }?=> '>'; >> TEXT : (~SPECIAL_CHARS)+; >> >> This is in a lexer grammar and I've omitted some other rules that >> shouldn't (I don't think) have any bearing on this question. > > Currently ANTLR lexers greedily consume the longest possible sequence of > acceptable characters for each token. > > So I think that when the characters that follow the '>' match TEXT e.g. > are not one of the SPECIAL_CHARS then the entire sequence is matched as > TEXT. and the same drill for the INDENT token. > > You can verify this by simply trying input such as ">$" or " $" -- each > on a line by itself. I would think you would then get either a QUOTE or > INDENT followed by whatever token matches a $. (Note, this may not parse > correctly but you should still see the 2 token sequence...) > > Hope this helps... > -jbb > > >
List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
