Hi George,
Thanks for this feedback. The method I was describing is a form of incremental lexing, but is quite different from the one you referenced. I'll be looking to see if I can combine each of their strengths as I work. :) Common features: * Both methods are incremental. Visual Studio's (VS) incremental lexing restarts at the beginning of the line, and General Incremental Lexing (GIL) restarts at the first affected token. * Both methods stop/suspend/defer the incremental updating process when the last on-screen token is processed. Strengths of GIL over VS: * Allows lookahead past newlines. * Doesn't have to restart at the beginning of a line. * Allows true multiline tokens. Strengths of VS over GIL: * Able to incrementally parse recursive constructs, such as languages that allow nested /* */ block comments. * Smaller lower-bound on processing requirements. * Much smaller memory overhead. If this is correct, then from what I can tell it would be beneficial to use the method I described as long as you don't have very long lines of text. Also, the SlimToken is actually lighter than a flyweight token, but again it can only be used as long as you don't need more information than it's able to store. Sam From: George Scott [mailto:[email protected]] Sent: Friday, May 22, 2009 4:09 PM To: Sam Harwell Cc: [email protected]; [email protected] Subject: Re: [antlr-dev] Syntax highlighting and performance possibilities Sam, Have you looked at Incremental lexing? I think it provides very good performance and used by a number of IDEs. A great reference on incremental lexing is this paper: http://harmonia.cs.berkeley.edu/papers/twagner-lexing.pdf To reduce memory you can use flyweight tokens (one token instance shared by all token streams) for token types whose length does not vary. You can use this for keywords, common white-space patterns such as a single-space, etc. The trade-off is that you have to compute the start/stop indexes for tokens based on a nearby non-flyweight token and the known-length of the flyweight. Generally, not a problem since syntax highlighting finds a start token given a line number and walks forward in token order, so you can keep a running count. When using incremental lexing with syntax highlighting, you generally only have to re-lex from the point of the edit to the token containing the last visible character on screen, so there is not a large cost even if editing at the beginning of the file. As the user scrolls the document, you continue lexing from the last token. It is pretty straight-forward to modify the ANTLR runtime to use these techniques. George
_______________________________________________ antlr-dev mailing list [email protected] http://www.antlr.org/mailman/listinfo/antlr-dev
