On Wed, May 13, 2015 at 7:56 AM, Keean Schupke <ke...@fry-it.com> wrote:

> The problem I can see for performance is that you want to statically
> generate and resolve the AST, so that:
>

> 'a', 'b', 'c' | 'a', 'b', 'd'
>

In LR, that would be a standard left-factoring problem.

I agree with you that this is a case where back-tracking recursive descent
(BTRD) tends to work poorly. Or rather: it's a place where programmers tend
to *use* BTRD poorly, in that they fail to hand-implement the appropriate
left factoring. Though it should be noted that (a) these cases are rare in
real grammars (b) these cases are *dynamically* rare in real programs, (c)
memoizing parse trees may be a sufficient solution in practice, and
therefore (d) we should favor expressive clarity over performance.

The nasty case is when these arise recursively, as in:

p1 -> a b c  // p1 is your example
p1 -> a b d

p2 -> p1 d e // yet more backtracking over the input range
p2 -> p1 d f

the problem here is that a reduction of p1 or p2 is provisional until we
know that backtracking will not "undo" that reduction. If the parser is
built using combinators, there doesn't seem to be any obvious way to know
when a reduction may be hazarded by backtracking. Or at least I'm not
seeing it. Once you hit the point where the remaining options are "reduce
or fail" you can discard the memoized trees, but until you hit that point
you have to hang on to them.


But I'm wondering whether we may be talking past each other about static
generation. Though I spoke of building an AST for the parser and running
some CFG-checking algorithm, and that's definitely a static thing, I was
*not* contemplating this as a way to execute the parser. I was thinking of
it as a thing that could be done in a testing program, after which the
parser itself could be run in the usual way using combinators to specify
the recursive descent.

What you don't want is to loop over the syntax tree at runtime.
>

Well, you certainly don't want blind looping, but I think you mean
backtrack here rather than loop. And it's not so much that you don't want
to backtrack as that you don't want to redo the work you have already done
when you *do* backtrack. But assuming your parser is functional, it's a
performance issue rather than a correctness issue.

Hmm. Though it really helps to be able to do some compile-time analysis so
that you know when memoized subtress can safely be discarded as unreachable.


shap
_______________________________________________
bitc-dev mailing list
bitc-dev@coyotos.org
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to