"Jonathan M Davis" <[email protected]> wrote in message news:[email protected]... > > I would point out that there is an intention to eventually get a D lexer > and > parser into Phobos so that tools can take advantage of them. Those could > eventually lead to a frontend in D but would provide benefits far beyond > simply > having the compiler in D. >
Is the interest more in a D-specific lexer/parser or a generalized one? Or is it more of a split vote? I seem to remember interest both ways, but I don't know whether there's any consensus among the DMD/Phobos crew. A generalized lexer is nothing more than a regex engine that has more than one distinct accept state (which then gets run over and over until EOF). And the FSM is made simply by doing a combined regex "(regexForToken1 | regexForToken2 | regexForToken3 | ... )", and then each of those parts just get their own accept state. Which makes me wonder... There was a GSoC project to overhaul Phobos's regex engine, wasn't there? Is that done? Is it designed in a way that the stuff above wouldn't be real hard to add? And what about algoritm? Is it a Thompson NFA, ie, it traverses the NFA as if it were a DFA, effectively "creating" the DFA on-the-fly)? Or does it just traverse the NFA as an NFA? Or does it create an actual DFA and traverse that? An actual DFA would probably be best for a lexer. If a DFA, is it an optimized DFA? In my (limited) tests, it didn't seem like DFA-optimization would yield a notable benefit on typical programming-langauge tokens. It seems to be more suited to pathological cases.
