On Friday, 1 November 2013 at 21:22:46 UTC, Andrei Alexandrescu wrote:
...
Bugs stopping Pegged from going forward should receive high priority. I also encourage others to participate to that and similar work.


Andrei

Ack! I share Philippe's sense of timing. This would have been even nicer to hear a year ago when both of us were actively committing ;)

I was going to close a project or two that I care deeply about and had started a long time ago, but I see that this might be a hard decision.

Nonetheless, I am really glad that you are showing this interest! I like to hear stuff like this, since I too really like Pegged.


Andrei and Philippe,

I feel compelled to share some of my thoughts that I never had time to finish back then.

I was working on a parser-engine-as-a-library that could be used to as optimized internals for Pegged, as well as any other tools that need to recognize these common patterns.

The idea was to expose an API like this:

    string makeParser()
    {
        auto builder = new ParserBuilder!char;
        builder.pushSequence();
            builder.literal('x');
            builder.pushMaybe();
                builder.literal('y');
            builder.pop();
        builder.pop();
        return builder.toDCode("callMe");
    }

    const foo = makeParser();

    pragma(msg, foo);

    mixin(foo);

That snippet would create a parser that recognizes the grammar 'x' ( 'y'? ).
The current fledgling implementation creates this parser:
http://pastebin.com/MgSqWXE2

Of course, no one would be expected to write grammars like that. It would be the job of tools like Pegged or std.regex to package it up in nice syntax that is easy to use.

The code already takes a somewhat different strategy than Pegged's original strategy. Rather than generating a bunch of templates that dmd then has to instantiate to actualize the parser, it just emits a bunch of very primitive procedural D code. I suspect that this approach would mixin far faster with current dmd, because the deeply nested templates generated by Pegged seemed to be a bottleneck. I have to hand it to Philippe though for coming up with a very clever way to bootstrap the thing: once I saw how his templates assembled together, I realized just how convenient that was!

(And I think my parser generator still has to be tought how to avoid making redundant branches in its output: there's some hash table action that belongs in there somewhere.)

The small amount of code that I have for it is here:
https://github.com/chadjoan/xdc/blob/master/src/xdc/parser_builder/parser_builder.d

I wanted to eventually make it generic enough to recognize patterns in things besides strings. Being able to write grammars that recognize patterns in ASTs is /useful/. That leads into the whole xdc project: automate all of the tedious crud in semantic analysis, and thus make compiler writing, and possibly other AST manipulations in user code, become all much easier.

The other thing I wanted to do was to optimize it.
- I intended it to do no allocations unless the caller asks for it.
- I intended to do a bunch of PEG/regex hybridization.

I noticed some mathematical properties of PEGs and regular expressions that should allow you to mix the two as much as possible. All you have to do is tell it how to behave at the boundaries where they meet. And given that PEGs already define their own behavior pretty well, it would become possible to lower a lot of a PEG into regular expressions connected with a minimal set of PEG rules. This would be some awesome lowering: if you first do a pass that inlines as many rules as possible, and then do a second pass that converts PEG elements into regular elements whenever possible, then I feel like the thing will be damned near optimal. If you are wondering what these mathematical properties are, then I encourage you to look at this snippet where I define "unitary" and "non-unitary" expressions, for lack of prior terms:
http://pastebin.com/iYBypRc5

Another fun thought: PEGs can have look-behind that includes any regular elements without any additional algorithmic complexity. Just take all of the look-behinds in the grammar, mash them together into one big regular-expression using regular alternation (|), and then have the resulting automaton consume in lock-step with the PEG parser. Whenever the PEG parser needs to do a lookbehind, it just checks to see if the companion automaton is in a matching state for the capture it needs.

*sigh*, I feel like I could write a paper on this stuff if I were in grad school right now. Alas, I am stuck doing 50-60 hours a week of soul-sucking business programming. Well, then again, my understanding is that even though I can think of things that seem like they would make interesting topics for publishable papers, reality would have the profs conscript me to do completely different things that are possibly just as inane as the business programming.

I worry that the greater threat to good AST manipulation tools in D is a lack of free time, and not the DMD bugs as much.

I hope this is useful to someone!

Reply via email to