"Jonathan M Davis" <[email protected]> wrote in message 
news:[email protected]...
>
> I would point out that there is an intention to eventually get a D lexer 
> and
> parser into Phobos so that tools can take advantage of them. Those could
> eventually lead to a frontend in D but would provide benefits far beyond 
> simply
> having the compiler in D.
>

Is the interest more in a D-specific lexer/parser or a generalized one? Or 
is it more of a split vote? I seem to remember interest both ways, but I 
don't know whether there's any consensus among the DMD/Phobos crew.

A generalized lexer is nothing more than a regex engine that has more than 
one distinct accept state (which then gets run over and over until EOF). And 
the FSM is made simply by doing a combined regex  "(regexForToken1 | 
regexForToken2 | regexForToken3 | ... )", and then each of those parts just 
get their own accept state. Which makes me wonder...

There was a GSoC project to overhaul Phobos's regex engine, wasn't there? Is 
that done? Is it designed in a way that the stuff above wouldn't be real 
hard to add?

And what about algoritm? Is it a Thompson NFA, ie, it traverses the NFA as 
if it were a DFA, effectively "creating" the DFA on-the-fly)? Or does it 
just traverse the NFA as an NFA? Or does it create an actual DFA and 
traverse that? An actual DFA would probably be best for a lexer. If a DFA, 
is it an optimized DFA? In my (limited) tests, it didn't seem like 
DFA-optimization would yield a notable benefit on typical 
programming-langauge tokens. It seems to be more suited to pathological 
cases.


Reply via email to