I'd gladly help you with this. I'm very excited to have a parser in Phobos!
On Thu, Sep 29, 2011 at 1:20 AM, Nick Sabalausky <[email protected]> wrote: > "Jonathan M Davis" <[email protected]> wrote in message > news:[email protected]... >> On Wednesday, September 28, 2011 13:43 Nick Sabalausky wrote: >>> "Jonathan M Davis" <[email protected]> wrote in message >>> news:[email protected]... >>> >>> > I would point out that there is an intention to eventually get a D >>> > lexer >>> > and >>> > parser into Phobos so that tools can take advantage of them. Those >>> > could >>> > eventually lead to a frontend in D but would provide benefits far >>> > beyond >>> > simply >>> > having the compiler in D. >>> >>> Is the interest more in a D-specific lexer/parser or a generalized one? >>> Or >>> is it more of a split vote? I seem to remember interest both ways, but I >>> don't know whether there's any consensus among the DMD/Phobos crew. >>> >>> A generalized lexer is nothing more than a regex engine that has more >>> than >>> one distinct accept state (which then gets run over and over until EOF). >>> And the FSM is made simply by doing a combined regex "(regexForToken1 | >>> regexForToken2 | regexForToken3 | ... )", and then each of those parts >>> just get their own accept state. Which makes me wonder... >>> >>> There was a GSoC project to overhaul Phobos's regex engine, wasn't there? >>> Is that done? Is it designed in a way that the stuff above wouldn't be >>> real hard to add? >>> >>> And what about algoritm? Is it a Thompson NFA, ie, it traverses the NFA >>> as >>> if it were a DFA, effectively "creating" the DFA on-the-fly)? Or does it >>> just traverse the NFA as an NFA? Or does it create an actual DFA and >>> traverse that? An actual DFA would probably be best for a lexer. If a >>> DFA, >>> is it an optimized DFA? In my (limited) tests, it didn't seem like >>> DFA-optimization would yield a notable benefit on typical >>> programming-langauge tokens. It seems to be more suited to pathological >>> cases. >> >> There is some desire to have a lexer and parser in Phobos which basically >> have >> the same implementation as dmd (only in D instead of C++). That way, >> they're >> very close to the actual compiler, and it's easy to port fixes and >> improvements between the two. >> >> However, we definitely also want a more general lexer/parser generator >> which >> takes advantage of D's metaprogramming capabalities. Andrei was pushing >> more >> for that and doesn't really like the idea of the other, since it would >> reduce >> the desire to produce the more general solution. So, this _is_ some >> dissension >> on the matter. But there's definitely room for both. It's just a question >> of >> time and manpower. >> > > Boy, I gotta say I'm really tempted to tackle this. I don't know if I > *should* dedicate my already-tight time, but it's very tempting. And I have > already written a generalized lexer generator in D ( > www.semitwist.com/goldie ), so I have that experience (and codebase) to draw > upon. > > Only big question is whether it would be best to try to make Phobos's > existing regex engine flexible enough that it could be used by the lexer > (since a generalized lexer is essentially a regex engine with multiple > accept states, and optionally some customizable hooks). I've posted some > questions to that end in another branch of this thread. > > > > > >
