wow. So only disadvantage is not having a real object per token? Ter On Dec 1, 2010, at 3:28 PM, Sam Harwell wrote:
> Same test on the C target release build with full optimizations for speed > (including ANTLR3_INLINE_INPUT_UTF16): > > * Overhead (32-bit): 148 bytes/token > * Total parse time: 5.88s > * Rate: 4.76 mil tokens/sec > > The tree implementation I proposed for C# offers a significant raw > performance (speed) boost over the C target with optimizations, but uses > less than 10 bytes/token. :) > > I imagine you could pick up a lot by sharing the API portion of your tokens > (a pointer to a struct with the shared function pointers). > > Sam > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On > Behalf Of Jim Idle > Sent: Wednesday, December 01, 2010 2:57 PM > Cc: [email protected] > Subject: Re: [antlr-dev] Alternative token storage mechanisms > > It does show how much overhead there is to such languages compared to C > though :-) > > Jim > >> -----Original Message----- >> From: [email protected] [mailto:[email protected]] >> On Behalf Of Terence Parr >> Sent: Wednesday, December 01, 2010 12:50 PM >> To: Sam Harwell >> Cc: Johannes Luber ([email protected]); [email protected] >> Subject: Re: [antlr-dev] Alternative token storage mechanisms >> >> Hi Sam. Impressive. Is this all due to no object creation overhead? >> Ter >> On Dec 1, 2010, at 8:03 AM, Sam Harwell wrote: >> >>> Hi Dr. Parr, >>> >>> I revisited my old "slim parsing" work to again measure the >> performance difference against Lexer/CommonToken. Currently, >> SlimLexer/SlimToken has a limitation that it only stores type, >> channel, startIndex, and stopIndex. Each of these is limited to 16 bits. >> Originally I planned to use this for syntax highlighting, where I can >> work within those bounds. Now the basic metrics. These were tested on >> the following 4-function calculator lexer. >>> >>> tokens { >>> MUL='*'; >>> DIV='/'; >>> MOD='%'; >>> ADD='+'; >>> SUB='-'; >>> } >>> >>> IDENTIFIER >>> : ('a'..'z' | 'A'..'Z' | '_') >>> ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')* >>> ; >>> >>> NUMBER >>> : '0'..'9'+ >>> ; >>> >>> WS >>> : (' ' | '\t' | '\n' | '\r' | '\f')* >>> {$channel = Hidden;} >>> ; >>> >>> Memory - CommonToken (32-bit system): >>> . 8 bytes overhead for being a class >>> . 36 bytes overhead for member variables >>> >>> Memory - CommonToken (64-bit system): >>> . 16 bytes overhead for being a class (I believe that's the >> object header size) >>> . 44 bytes overhead for members >>> >>> Memory - SlimToken (32- or 64-bit systems): >>> . 8 bytes total storage, and no allocations since it's a >> value type. >>> >>> Lexer speed - CommonToken: >>> . Total time: 10.34s >>> . Rate: 2.71 mil tokens/sec >>> >>> Lexer speed - SlimToken: >>> . Total time 2.87s >>> . Rate: 9.76 mil tokens/sec >>> >>> My goal is to add enough CommonToken features back to SlimToken to >> make it usable without breaking its performance characteristics. To do >> so, I'm working on a new revision of SlimLexer that holds a ShortToken >> (backed by 32-bit int) or LongToken (backed by 64-bit int) (the lexer >> is generic in C#). The token itself stores its type (low 8-bits of >> ShortToken, 16-bits of LongToken), a flag of whether it's on the >> default channel or not (+/-), and 23- or 47-bits for the token index). >> As the lexer runs, it builds B-tree indexes for line lengths, token >> offset and (with token lengths derived). It also holds a map from >> Token->string so that it only has to track text when necessary. This >> gives O(1) access to the values that drive decision making (with >> (value & 0xF) giving the token type for ShortToken), and O(log_b(n)) >> access to other values. I expect to see a great improvement in >> performance with a very practical token for real parsing tasks. >>> >>> Sam >> >> _______________________________________________ >> antlr-dev mailing list >> [email protected] >> http://www.antlr.org/mailman/listinfo/antlr-dev > > _______________________________________________ > antlr-dev mailing list > [email protected] > http://www.antlr.org/mailman/listinfo/antlr-dev > > _______________________________________________ > antlr-dev mailing list > [email protected] > http://www.antlr.org/mailman/listinfo/antlr-dev _______________________________________________ antlr-dev mailing list [email protected] http://www.antlr.org/mailman/listinfo/antlr-dev
