Re: D parsing

Chad Joan Mon, 04 Nov 2013 22:41:19 -0800

On Friday, 1 November 2013 at 21:22:46 UTC, Andrei Alexandrescuwrote:

...
Bugs stopping Pegged from going forward should receive highpriority. I also encourage others to participate to that andsimilar work.
Andrei

Ack! I share Philippe's sense of timing. This would have beeneven nicer to hear a year ago when both of us were activelycommitting ;)

I was going to close a project or two that I care deeply aboutand had started a long time ago, but I see that this might be ahard decision.

Nonetheless, I am really glad that you are showing this interest!I like to hear stuff like this, since I too really like Pegged.



Andrei and Philippe,

I feel compelled to share some of my thoughts that I never hadtime to finish back then.

I was working on a parser-engine-as-a-library that could be usedto as optimized internals for Pegged, as well as any other toolsthat need to recognize these common patterns.


The idea was to expose an API like this:

    string makeParser()
    {
        auto builder = new ParserBuilder!char;
        builder.pushSequence();
            builder.literal('x');
            builder.pushMaybe();
                builder.literal('y');
            builder.pop();
        builder.pop();
        return builder.toDCode("callMe");
    }

    const foo = makeParser();

    pragma(msg, foo);

    mixin(foo);

That snippet would create a parser that recognizes the grammar'x' ( 'y'? ).

The current fledgling implementation creates this parser:
http://pastebin.com/MgSqWXE2

Of course, no one would be expected to write grammars like that.It would be the job of tools like Pegged or std.regex to packageit up in nice syntax that is easy to use.

The code already takes a somewhat different strategy thanPegged's original strategy. Rather than generating a bunch oftemplates that dmd then has to instantiate to actualize theparser, it just emits a bunch of very primitive procedural Dcode. I suspect that this approach would mixin far faster withcurrent dmd, because the deeply nested templates generated byPegged seemed to be a bottleneck. I have to hand it to Philippethough for coming up with a very clever way to bootstrap thething: once I saw how his templates assembled together, Irealized just how convenient that was!

(And I think my parser generator still has to be tought how toavoid making redundant branches in its output: there's some hashtable action that belongs in there somewhere.)


The small amount of code that I have for it is here:
https://github.com/chadjoan/xdc/blob/master/src/xdc/parser_builder/parser_builder.d

I wanted to eventually make it generic enough to recognizepatterns in things besides strings. Being able to write grammarsthat recognize patterns in ASTs is /useful/. That leads into thewhole xdc project: automate all of the tedious crud in semanticanalysis, and thus make compiler writing, and possibly other ASTmanipulations in user code, become all much easier.


The other thing I wanted to do was to optimize it.

- I intended it to do no allocations unless the caller asks forit.

- I intended to do a bunch of PEG/regex hybridization.

I noticed some mathematical properties of PEGs and regularexpressions that should allow you to mix the two as much aspossible. All you have to do is tell it how to behave at theboundaries where they meet. And given that PEGs already definetheir own behavior pretty well, it would become possible to lowera lot of a PEG into regular expressions connected with a minimalset of PEG rules. This would be some awesome lowering: if youfirst do a pass that inlines as many rules as possible, and thendo a second pass that converts PEG elements into regular elementswhenever possible, then I feel like the thing will be damned nearoptimal. If you are wondering what these mathematical propertiesare, then I encourage you to look at this snippet where I define"unitary" and "non-unitary" expressions, for lack of prior terms:

http://pastebin.com/iYBypRc5

Another fun thought: PEGs can have look-behind that includes anyregular elements without any additional algorithmic complexity.Just take all of the look-behinds in the grammar, mash themtogether into one big regular-expression using regularalternation (|), and then have the resulting automaton consume inlock-step with the PEG parser. Whenever the PEG parser needs todo a lookbehind, it just checks to see if the companion automatonis in a matching state for the capture it needs.

*sigh*, I feel like I could write a paper on this stuff if I werein grad school right now. Alas, I am stuck doing 50-60 hours aweek of soul-sucking business programming. Well, then again, myunderstanding is that even though I can think of things that seemlike they would make interesting topics for publishable papers,reality would have the profs conscript me to do completelydifferent things that are possibly just as inane as the businessprogramming.

I worry that the greater threat to good AST manipulation tools inD is a lack of free time, and not the DMD bugs as much.


I hope this is useful to someone!

Re: D parsing

Reply via email to