On Sat, Mar 17, 2012 at 18:11, Andrei Alexandrescu <seewebsiteforem...@erdani.org> wrote:
>> The D grammar is a 1000-line / hundreds of rules monster. I finished >> writing it and am now crushing bugs. >> God, that generates a 10_000 line module to parse it. I should >> simplify the code generator somewhat. > > > Science is done. Welcome to implementation :o). Hey, it's only 3.000 lines now :) Coming from a thousand-lines grammar, it's not that much an inflation. > I can't say how excited I am about this direction. I have this vision of > having a D grammar published on the website that is actually "it", i.e. the > same exact grammar is used by a validator that goes through all of our test > suite. (The validator wouldn't do any semantic checking.) The parser > generator _and_ the reference D grammar would be available in Phobos, so for > anyone it would be dirt cheap to parse some D code and wander through the > generated AST. The availability of a reference grammar and parser would be > golden to a variety of D toolchain creators. Indeed, but I fear the D grammar is a bit too complex to be easily walked. Now that I read it, I realize that '1' is parsed as a 10-levels deep leaf! Compared to lisp, it's... not in the same league, to say the least. I will see to drastically simplify the parse tree. Does anyone have experience with other languages similar to D and that offer AST-walking? Doesn't C# have something like this? (I'll have a look at Scala macros) > Just to gauge interest: > > 1. Would you consider submitting your work to Phobos? Yes, of course. It's already Boost-licensed. Seeing the review processes for other modules, it'd most certainly put the code in great shape. But then, it's far from being submittable right now. > 2. Do you think your approach can generate parsers competitive with > hand-written ones? If not, why? Right now, no, if only because I didn't take any step in making it fast or in limiting its RAM consumption. After applying some ideas I have, I don't know. There are many people here that are parser-aware and could help make the code faster. But at the core, to allow mutually recursive rules, the design use classes: class A : someParserCombinationThatMayUseA { ... } Which means A.parse (a static method) is just typeof(super).parse (also static, and so on). Does that entail any crippling disadvantage compared to hand-written parser? Philippe