Sam Harwell wrote:

Today I decided to try and evaluate the potential performance benefits of a “lightweight” lexer mode. I find that I often don’t need/use many of the items in the token, with the limit being syntax highlighters that only need the token type and start index in the line. For my experiment, I did the following:

 

·         Create the generic interfaces ITokenSource<T>, and ITokenStream<T>

·         Create the generic classes Lexer<T> and TokenStream<T> with no virtual functions in the fast-path, including working on a string instead of one of the ICharStream types.

·         Create a struct (in C#, this is an unboxed value type) with 2 shorts for a total token size of 32 bits.

 

The test lexer recognizes C-style identifiers, whitespace, and integers. One copy is derived from Lexer, and the other from Lexer<T>.

 

The input for a single iteration is 25000000 Unicode chars, generated from 1000000 copies of "x-2356*Abte+32+eno/6623+y". I ran 5 iterations of each lexer before starting the timer to allow the JIT to compile the hot methods. I then timed 5 iterations of each, and here is the sum result:

 

Elapsed time (normal): 43.546875 seconds.

Elapsed time (fast): 7.078125 seconds.

 

Summary: For a particular task I perform very often, deriving from some slightly altered base classes yielded a 6:1 time improvement, substantially lowered memory overhead, and did not lose any information I needed. I’ll certainly be examining possibilities for wider use of this work in the future.

 

Hi Sam,

Send along your lexer, I would like to see how this compares with C (I presume your measurements are C#?). Also, what does profiling tell you about the difference in time? Object creation? Of course it is a fairly simple lexer, but in this case I think it is valid because then the time differences are isolated to those things that are to do with more complicated tokens.

I was going to do a simple C version oas a target, but having used it in anger, C is already fast enough.  We do need to do some performance improvement work, but I suspect that this will really happen when Ter is freed from having to work for a living for a while, coming up soon ;-)

Jim
_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org/mailman/listinfo/antlr-dev

Reply via email to