On 29.09.2011 0:43, Nick Sabalausky wrote:
"Jonathan M Davis"<[email protected]>  wrote in message
news:[email protected]...

I would point out that there is an intention to eventually get a D lexer
and
parser into Phobos so that tools can take advantage of them. Those could
eventually lead to a frontend in D but would provide benefits far beyond
simply
having the compiler in D.


Is the interest more in a D-specific lexer/parser or a generalized one? Or
is it more of a split vote? I seem to remember interest both ways, but I
don't know whether there's any consensus among the DMD/Phobos crew.

A generalized lexer is nothing more than a regex engine that has more than
one distinct accept state (which then gets run over and over until EOF). And
the FSM is made simply by doing a combined regex  "(regexForToken1 |
regexForToken2 | regexForToken3 | ... )", and then each of those parts just
get their own accept state. Which makes me wonder...

There was a GSoC project to overhaul Phobos's regex engine, wasn't there? Is
that done? Is it designed in a way that the stuff above wouldn't be real
hard to add?

That was me. And, yes, that's exactly the kind of thing I'm pursing right now. Lexer with compile-time generated FSM looks very promising e.g. for DSELs. In fact, I think I already have good vision on how to reuse existing code to get there.


And what about algoritm? Is it a Thompson NFA, ie, it traverses the NFA as
if it were a DFA, effectively "creating" the DFA on-the-fly)?

As for implementation there are two algorithms operating on the same IR. The default one is Thompson NFA, so yes with some work I can make it generate code for DFA instead of executing. It's just till now I generated more simple structures from it e.g. bit-parallel prefix searcher.

 Or does it
just traverse the NFA as an NFA? Or does it create an actual DFA and
traverse that? An actual DFA would probably be best for a lexer. If a DFA,
is it an optimized DFA? In my (limited) tests, it didn't seem like
DFA-optimization would yield a notable benefit on typical
programming-langauge tokens. It seems to be more suited to pathological
cases.


I'm also dubious if DFA optimization techniques would worth the hassle. Not a top priority.


--
Dmitry Olshansky

Reply via email to