[il-antlr-interest: 29111] [antlr-interest] Multiple lexer tokens per rule

Ken Williams Thu, 03 Jun 2010 13:42:21 -0700

Both the DAR book and the Javadoc
(http://www.antlr.org/api/ActionScript/org/antlr/runtime/Lexer.html#emitToke
n() ) mention that if you want to emit multiple tokens for a single lexer
rule, you need to override emit() or emitToken().  Does anyone have any
examples of doing that?


I assume nextToken() would also need to be overridden.


In case I have an XY Problem
(http://www.perlmonks.org/index.pl?node_id=542341), my use case is to parse
as in the following examples:

23      -> DIGITS
23,     -> DIGITS PUNC
23,450  -> NUMERIC
23,450, -> NUMERIC PUNC

To do that, I'm using a lexer rule that consumes all the numeric & permitted
in-numeric punctuation, then I fix it up afterwards:

-----------------------
token    : ...
    | DIGITS 
    | NUMERIC -> {fixNum($text)}
    | PUNC

PUNC   : '-' | ',' | '.' ;
fragment DIGIT    : '0'..'9' ;
NUMERIC    :    DIGIT (DIGIT | PUNC)*
        {if ($text.matches("^[0-9]+$")) {$type=DIGITS;}} ;
-----------------------

My fixNum() method is trying to fix things up at the parser level, but I
really want to do it in the lexer.

An alternate solution might be to "push back" any trailing punctuation onto
the input stream.  Not sure if that's possible?


-- 
Ken Williams
Sr. Research Scientist
Thomson Reuters
Phone: 651-848-7712
[email protected]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 29111] [antlr-interest] Multiple lexer tokens per rule

Reply via email to