There are several aspects of the CLI I was trying to take advantage of
in this. Unfortunately, based on a paper I read this weekend, it appears
many of them don't convert back to the JVM with the same performance
benefits. I'm going to have to do a lot of thinking on how to implement
this cleanly.

The CLI allows the creation of user-defined unboxed value types. Past
that, it turns out that if one of those value types implements an
interface used by a generic class (e.g. class Foo<T> where T : IToken),
the instantiation of that generic class *won't* box the value type into
an IToken reference before working with it. This combination results in
a lexer that allocates an array for the unboxed tokens (List<T>), but
doesn't create any other objects for the tokens themselves. Based on
what I've seen in previous profiling of the Lexers, this is responsible
for the majority of the performance improvements I described in the
original mail.

I made the methods non-virtual to take advantage of the inlining
abilities of the JIT. In implementing the IToken interface, my token
type has properties that are just stubs. Each of the operations on my
token is either trivially inlined or trivially removed.

Sam

For reference, here is the 32-bit unboxed value type I created for
testing.

public struct FastToken : IToken
{
    short _type;
    short _charPositionInLine;

    #region IToken Members

    public string Text
    {
        get { return string.Empty; }
        set { }
    }

    public int Type
    {
        get { return _type; }
        set { _type = (short)value; }
    }

    public int Line
    {
        get { return 0; }
        set { }
    }

    public int CharPositionInLine
    {
        get { return _charPositionInLine; }
        set { _charPositionInLine = (short)value; }
    }

    public int Channel
    {
        get { return TokenChannels.Default; }
        set { }
    }

    public int TokenIndex
    {
        get { return 0; }
        set { }
    }

    public ICharStream InputStream
    {
        get { return null; }
        set { }
    }

    #endregion
}

-----Original Message-----
From: Terence Parr [mailto:[email protected]] 
Sent: Sunday, May 03, 2009 6:03 PM
To: Sam Harwell
Cc: ANTLR-dev Dev
Subject: Re: [antlr-dev] Interesting Lexer performance results

wow interesting results. Java doesn't have nonvirtual methods but  
should do a good job of direct dispatch one possible. so the use of  
the generic type allowed their code-generation? well, I guess that  
depends on avoiding the virtual method calls...I'll be dealing with  
optimization sometime this summer I think.
Ter
On May 3, 2009, at 3:07 PM, Sam Harwell wrote:

> Today I decided to try and evaluate the potential performance  
> benefits of a "lightweight" lexer mode. I find that I often don't  
> need/use many of the items in the token, with the limit being syntax  
> highlighters that only need the token type and start index in the  
> line. For my experiment, I did the following:
>
> *         Create the generic interfaces ITokenSource<T>, and  
> ITokenStream<T>
> *         Create the generic classes Lexer<T> and TokenStream<T>  
> with no virtual functions in the fast-path, including working on a  
> string instead of one of the ICharStream types.
> *         Create a struct (in C#, this is an unboxed value type)  
> with 2 shorts for a total token size of 32 bits.
>
> The test lexer recognizes C-style identifiers, whitespace, and  
> integers. One copy is derived from Lexer, and the other from Lexer<T>.
>
> The input for a single iteration is 25000000 Unicode chars,  
> generated from 1000000 copies of "x-2356*Abte+32+eno/6623+y". I ran  
> 5 iterations of each lexer before starting the timer to allow the  
> JIT to compile the hot methods. I then timed 5 iterations of each,  
> and here is the sum result:
>
> Elapsed time (normal): 43.546875 seconds.
> Elapsed time (fast): 7.078125 seconds.
>
> Summary: For a particular task I perform very often, deriving from  
> some slightly altered base classes yielded a 6:1 time improvement,  
> substantially lowered memory overhead, and did not lose any  
> information I needed. I'll certainly be examining possibilities for  
> wider use of this work in the future.
>
> Sam Harwell
> Pixel Mine, Inc.
> _______________________________________________
> antlr-dev mailing list
> [email protected]
> http://www.antlr.org/mailman/listinfo/antlr-dev

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org/mailman/listinfo/antlr-dev

Reply via email to