Re: Request for comments: std.d.lexer

Timon Gehr Mon, 28 Jan 2013 13:05:40 -0800

On 01/28/2013 12:59 PM, Dmitry Olshansky wrote:

28-Jan-2013 15:48, Johannes Pfau пишет:

...


But to be fair that doesn't fit ranges very well. If you don't want to
do any allocation but still keep identifiers etc in memory this
basically means you have to keep the whole source in memory and this is
conceptually an array and not a range.


Not the whole source but to construct a table of all identifiers. The
source is awfully redundant because of repeated identifiers, spaces,
comments and what not. The set of unique identifiers is rather small.

Source code is usually small. (Even std.datetime has 'only' 1.6 MB.) Myown lexer-parser combination slices directly into the original sourcecode, for every token, in order to remember the exact source location,and the last time I have measured, it ran faster than DMD's. I keep thesource around for error reporting anyway:


tt.d:27:5: error: no member 'n' for type 'A'
    a.n=2;
    ^~~

Since the tokens point directly into the source code, it is notnecessary to construct any other data structures in order to allow fastretrieval of the appropriate source code line.

But it's clear that a general-purpose library might not want to imposethis restriction on storage upon it's clients. I think it is somewhathelpful for speed though. The other thing I do is buffering tokens in acontiguous ring buffer that grows if a lot of lookahead is requested.

I think the best course of action is to just provide a hook to trigger
on every identifier encountered. That could be as discussed earlier a
delegate.

...

Maybe. I map identifiers to unique id's later, in the identifier ASTnode constructor, though.

Re: Request for comments: std.d.lexer

Reply via email to