On 01/28/2013 12:59 PM, Dmitry Olshansky wrote:
28-Jan-2013 15:48, Johannes Pfau пишет:
...

But to be fair that doesn't fit ranges very well. If you don't want to
do any allocation but still keep identifiers etc in memory this
basically means you have to keep the whole source in memory and this is
conceptually an array and not a range.


Not the whole source but to construct a table of all identifiers. The
source is awfully redundant because of repeated identifiers, spaces,
comments and what not. The set of unique identifiers is rather small.


Source code is usually small. (Even std.datetime has 'only' 1.6 MB.) My own lexer-parser combination slices directly into the original source code, for every token, in order to remember the exact source location, and the last time I have measured, it ran faster than DMD's. I keep the source around for error reporting anyway:

tt.d:27:5: error: no member 'n' for type 'A'
    a.n=2;
    ^~~

Since the tokens point directly into the source code, it is not necessary to construct any other data structures in order to allow fast retrieval of the appropriate source code line.

But it's clear that a general-purpose library might not want to impose this restriction on storage upon it's clients. I think it is somewhat helpful for speed though. The other thing I do is buffering tokens in a contiguous ring buffer that grows if a lot of lookahead is requested.

I think the best course of action is to just provide a hook to trigger
on every identifier encountered. That could be as discussed earlier a
delegate.

...

Maybe. I map identifiers to unique id's later, in the identifier AST node constructor, though.

Reply via email to