"Jonathan M Davis" <[email protected]> wrote in message news:[email protected]... > On Wednesday, September 28, 2011 13:43 Nick Sabalausky wrote: >> "Jonathan M Davis" <[email protected]> wrote in message >> news:[email protected]... >> >> > I would point out that there is an intention to eventually get a D >> > lexer >> > and >> > parser into Phobos so that tools can take advantage of them. Those >> > could >> > eventually lead to a frontend in D but would provide benefits far >> > beyond >> > simply >> > having the compiler in D. >> >> Is the interest more in a D-specific lexer/parser or a generalized one? >> Or >> is it more of a split vote? I seem to remember interest both ways, but I >> don't know whether there's any consensus among the DMD/Phobos crew. >> >> A generalized lexer is nothing more than a regex engine that has more >> than >> one distinct accept state (which then gets run over and over until EOF). >> And the FSM is made simply by doing a combined regex "(regexForToken1 | >> regexForToken2 | regexForToken3 | ... )", and then each of those parts >> just get their own accept state. Which makes me wonder... >> >> There was a GSoC project to overhaul Phobos's regex engine, wasn't there? >> Is that done? Is it designed in a way that the stuff above wouldn't be >> real hard to add? >> >> And what about algoritm? Is it a Thompson NFA, ie, it traverses the NFA >> as >> if it were a DFA, effectively "creating" the DFA on-the-fly)? Or does it >> just traverse the NFA as an NFA? Or does it create an actual DFA and >> traverse that? An actual DFA would probably be best for a lexer. If a >> DFA, >> is it an optimized DFA? In my (limited) tests, it didn't seem like >> DFA-optimization would yield a notable benefit on typical >> programming-langauge tokens. It seems to be more suited to pathological >> cases. > > There is some desire to have a lexer and parser in Phobos which basically > have > the same implementation as dmd (only in D instead of C++). That way, > they're > very close to the actual compiler, and it's easy to port fixes and > improvements between the two.
The lexer seems like something that would change only on rare occasions. Am I wrong? > > However, we definitely also want a more general lexer/parser generator > which > takes advantage of D's metaprogramming capabalities. Andrei was pushing > more > for that and doesn't really like the idea of the other, since it would > reduce > the desire to produce the more general solution. So, this _is_ some > dissension > on the matter. But there's definitely room for both. It's just a question > of > time and manpower. > I see.
