Re: std.d.lexer : voting thread

deadalnix Fri, 04 Oct 2013 21:27:06 -0700

On Saturday, 5 October 2013 at 00:24:22 UTC, Andrei Alexandrescuwrote:

Vision
======
I'd been following the related discussions for a while, but Ihave made up my mind today as I was working on a C++ lexertoday. The C++ lexer is for Facebook's internal linter. I'mtranslating the lexer from C++.
Before long I realized two simple things. First, I can't reuseanything from Brian's code (without copying it and doingsurgery on it), although it is extremely similar to what I'mdoing.
Second, I figured that it is almost trivial to implement asimple, generic, and reusable (across languages and tasks)static trie searcher that takes a compile-time array with alltokens and keywords and returns the token at the front of arange with minimum comparisons.
Such a trie searcher is not intelligent, but is very composableand extremely fast. It is just smart enough to do maximum munch(e.g. interprets "==" and "foreach" as one token each, nottwo), but is not smart enough to distinguish an identifier"whileTrue" from the keyword "while" (it claims "while" wasfound and stops right at the beginning of "True" in thestream). This is for generality so applications can define howidentifiers work (e.g. Lisp allows "-" in identifiers but Ddoesn't etc). The trie finder doesn't do numbers or commentseither. No regexen of any kind.
The beauty of it all is that all of these more involved bits(many of which are language specific) can be implementedmodularly and trivially as a postprocessing step after the triefinder. For example the user specifies "/*" as a token to thetrie finder. Whenever a comment starts, the trie finder willfind and return it; then the user implements the alternategrammar of multiline comments.

That is more or less how SDC's lexer works. You pass it 2AA : onewith string associated with tokens type, and one with string tofunction's name that return the actual token (for instance tohandle /*) and finally one when nothing matches.


A giant 3 headed monster mixin is created from these data.

That has been really handy so far.

If what we need at this point is a conventional lexer for the Dlanguage, std.d.lexer is the ticket. But I think it wouldn't bedifficult to push our ambitions way beyond that. What say you?


Yup, I do agree.

Re: std.d.lexer : voting thread

Reply via email to