You need to use a collection that gives out the entries in the order they were added:
http://java.sun.com/docs/books/tutorial/collections/interfaces/queue.html Jim > -----Original Message----- > From: [email protected] [mailto:antlr-interest- > [email protected]] On Behalf Of Ken Williams > Sent: Friday, June 04, 2010 9:02 AM > To: Junkman > Cc: ANTLR list > Subject: Re: [antlr-interest] Multiple lexer tokens per rule > > > > > On 6/3/10 5:36 PM, "Junkman" <[email protected]> wrote: > > > Try this to get you started: [...] > > Thanks, that's a good start. There's still some bookkeeping I'm not > getting, though. I seem to have to queue them in the reverse order > that I > want them out - in the Lexer I do 'queueUp(tok1); emit(tok2);' and then > in > nextToken() I return the queued token first. But then for some reason > I get > the tokens in the sequence 'tok2 tok1'. > > It seems like maybe somewhere in the generated code, something¹s > accessing > tokens directly in the Œstate¹ member, or something¹s getting confused > by > using Œindex¹, or something like that. > > My complete [toy] grammar is below. When I use it, I get the following > results: > > 23 -> DIGITS *good* > 23, -> PUNC DIGITS *bad* > 23,450 -> NUMERIC *good* > 23,450, -> PUNC NUMERIC *bad* > > > ---------------------------------------------------- > grammar testg; > > options { backtrack=true; memoize=true; output=AST; } > > tokens { PUNC; DIGITS; } > > @lexer::header{ > package com.tr.research.cites; > import java.util.regex.Pattern; > import java.util.regex.Matcher; > } > @parser::header{ package com.tr.research.cites; } > > @lexer::members { > protected Pattern trailingPunc = Pattern.compile("[^0-9]+$"); > protected void fixNum(String text) { > if (text.matches("^[0-9]+$")) { emit(new CommonToken(DIGITS, > text)); > return; } > if (text.matches("^.*[0-9]+$")) { emit(new CommonToken(NUMERIC, > text)); return; } > > Matcher m = trailingPunc.matcher(text); > if (!m.find()) > throw new RuntimeException("Can't figure out numeric token > '" + > text + "'"); > > String prefix = text.substring(0, m.start()); > String suffix = text.substring(m.start()); > > queueUp( new CommonToken(prefix.matches("^[0-9]+$") ? DIGITS : > NUMERIC, prefix) ); > emit(new CommonToken( PUNC, suffix )); > } > > // Queue to hold additional tokens > private java.util.Queue<Token> tokenQueue = new > java.util.LinkedList<Token>(); > > // Include queue in reset(). > public void reset() { > super.reset(); > tokenQueue.clear(); > } > > // Queued tokens are returned before matching a new token. > public Token nextToken() { > return tokenQueue.isEmpty() ? super.nextToken() : > tokenQueue.poll(); > } > > public void queueUp(Token t) { > tokenQueue.add(t); > } > } > > cite : token+ EOF ; > token : DIGITS | NUMERIC | PUNC ; > WS : ( ' ' | '\t'| '\f' | '\n' | '\r' ) {skip();} ; > > fragment DIGIT : '0'..'9' ; > NUMERIC : DIGIT (DIGIT | '-' | ',' | '.')* {fixNum($text);} ; > ---------------------------------------------------- > > > -- > Ken Williams > Sr. Research Scientist > Thomson Reuters > Phone: 651-848-7712 > [email protected] > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
