Why do you have to copy the token? You just pass a pointer to it, and when you want the text, use the pointers in the token.
You solution is fine, but I don't think it works in all cases of fragments, but cannot remember why just now. There are solutions in antlr.markmail.org Jim > -----Original Message----- > From: Ruslan Zasukhin [mailto:[email protected]] > Sent: Sunday, April 17, 2011 5:38 AM > To: [email protected]; Jim Idle > Subject: Re: [antlr-interest] v2->v3 Skip chars in Lexer? For C-target > [SOLVED 2.5] > > Hi All, > > After Jim points to more effective way skip wrapper-quotes, And some > more time, this is working solution for archive: > > //-------------------------------------------------------------------- > IDENT > : ( LETTER | '_' ) ( LETTER | '_' | DIGIT )* > ; > > // RZ 04/17/11: in ANTLR v3 there is no way skip chars in lexer. Oops. > // Instead we do trick suggest by Jim Idle on ANTLR list: > // skip first/last chras of token on the parser level. > // > DELIMITED // delimited_identifier > : > ( DQUOTE ( ~(DQUOTE) | DQUOTE DQUOTE )+ DQUOTE > | BQUOTE ( ~(BQUOTE) | BQUOTE BQUOTE )+ BQUOTE > | LBRACK ( ~(']') )+ RBRACK > ) > ; > > > And on the parser level, we use Token and its pointers to ++ / -- Also > type of Token is changed to IDENT with help of re-write. > > > //-------------------------------------------------------------------- > identifier > : IDENT // regular_identifier > > | d=DELIMITED // delimited_identifier > { > ++$d->start; > --$d->stop; > } > -> ^( IDENT[$d.text->chars] ) > ; > > > > ================ > Works... But ... > I am far not sure that this solution is really more effective, Jim. > > Yes, on lexer level I have use ->chars, and you say it is slower ... > > But on parser level, except to fast ++ / -- operations, we need yet > create second token IDENT and copy all values from the first ... > > Sizeof( ANTLR3_COMMON_TOKEN_struct) is about 160-200 bytes. > > So creation by new and copy about 150 bytes to skip TWO chars not looks > so cheap operation. Also note that IDENTs usually 5-20 chars only. > Much less of 200 bytes of that structure. > > > And may be my first solution with Lexer level was not so bad? > > And I still have TODO: skip chars inside of LITERAL on parser level > ... > here we cannot do just ++ \ -- > > > ================ > I do not see yet the whole picture how works lexer on low level in C. > > Also I do not see yet any clean information about UTF encodings in C- > target. > I am going ask about this in future letters. > > > -- > Best regards, > > Ruslan Zasukhin > VP Engineering and New Technology > Paradigma Software, Inc > > Valentina - Joining Worlds of Information http://www.paradigmasoft.com > > [I feel the need: the need for speed] > List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
