Ah. I thought they did first match, not longest...

On Sat, May 7, 2011 at 12:12 PM, John B. Brodie <[email protected]> wrote:
> Greetings!
>
> On Sat, 2011-05-07 at 10:06 -0400, Todd O'Bryan wrote:
>> Can anyone explain to me why tabs, spaces, and greater-thans at the
>> beginning of lines are ending up in TEXT tokens, rather than in INDENT
>> or QUOTE tokens, as I think they should?
>>
>> fragment SPECIAL_CHARS
>>       : ('\n' | '[' | ']' | '*' | '/' |'=' | '^' | '_' | '8' | '@' | '#' |
>> '$' | '!' | '(' | ')' | '{' | '}' );
>>
>> INDENT        : { getCharPositionInLine() == 0 }?=> (' '|'\t')+;
>> QUOTE : { getCharPositionInLine() == 0 }?=> '>';
>> TEXT  : (~SPECIAL_CHARS)+;
>>
>> This is in a lexer grammar and I've omitted some other rules that
>> shouldn't (I don't think) have any bearing on this question.
>
> Currently ANTLR lexers greedily consume the longest possible sequence of
> acceptable characters for each token.
>
> So I think that when the characters that follow the '>' match TEXT e.g.
> are not one of the SPECIAL_CHARS then the entire sequence is matched as
> TEXT. and the same drill for the INDENT token.
>
> You can verify this by simply trying input such as ">$" or " $" -- each
> on a line by itself. I would think you would then get either a QUOTE or
> INDENT followed by whatever token matches a $. (Note, this may not parse
> correctly but you should still see the 2 token sequence...)
>
> Hope this helps...
>   -jbb
>
>
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

Reply via email to