Hi, Guys.
Perhaps, I should have mentioned this earlier. Might have avoided such a
long post out of the blue.
About three weeks ago I was doing some thinking and came up with a
similar solution.
I did some reading around and, like Chris, found that the best way to
describe the algorithm was as recursive-ascent used only on the
left-recursive non-terminals.
The way I had thought about it was in terms of matching either
left-recursive alternatives or non-left-recursive alternatives, for a
given non-terminal.
For the right most derivation, the non-left-recursive alts will normally
be matched just once at the start of the string. From then on, it's all
about supplying a left hand side to a left-recursive rule.
With that in mind, I explicitly think of the left and non-left
alternatives in my non-terminal as actually being two separate
non-terminals.
What this means is that, calling the original non-terminal first matches
the non-left and and then gives that result to the left-recursive part.
Now, it's pretty obvious that we can still memoize the non-left part
since it doesn't change depending on what came before it. What is
slightly less obvious is that, because the left-recursive part only
changes in terms of the left-hand that was supplied to it, we can also
memoize that. The key is, we memoize all the children of the
left-recursive part except for the left-hand side.
When we come to retrieve a left-recursive part that has been memoized,
instead of getting a node and its children, we create the node then give
it its left-hand side and memoized children.
Now, I mentioned that the right most derivation will normally match the
non-left part just once. It turns out that for cases where a rule has a
second recursive call, we want to match only the non-left part for that
second call and then carry on our way.
Try deriving trees for this grammar,
M <- M 'x' M / 'm'
over some of these strings,
'm' ('xm')*
to see what I mean.
If we only match 'm' for the second M we get what we want. If we let the
second M act like a normal call, we get a right associative tree which
isn't what we want.
So, at this point we have linear time parsing for directly
left-recursive grammars. What about indirect?
Well, that's also pretty easy and doesn't complicate things as much as
you might think.
What we need to do is, instead of adding just two non-terminals, we add
two for each non-terminal involved in the indirect left-recursion.
I'll use this example to help explain myself.
Consider we want to evaluate A from the following grammar.
A <- B 'a' / 'A'
B <- C 'b' / 'B'
C <- A 'c' / 'C'
We invoke the original A,
{
// Find the left hand side
if ((leftHand = a_nonleft()) != SUCCESS)
if ((leftHand = b_nonleft()) != SUCCESS)
if ((leftHand = c_nonleft()) != SUCCESS)
return fail
end
end
end
result = fail
// Give the left hand side to the part that expects it
while (leftHand == SUCCESS)
if (leftHand.type == A)
// We have an A which is the same type as our original invocation
result = leftHand
leftHand = c_left_rec(leftHand)
else
if (leftHand.type == B)
leftHand = a_left_rec(leftHand)
else
if (leftHand.type == C)
leftHand = b_left_rec(leftHand)
end
end
end
end
// Return the furthest A we could find
return result
}
Some things to note about this:
There is ordering based disambiguation when checking what the first left
hand side should be and choosing between which left_rec part to test for
a given type. (This example only has the first.)
Within a given non-terminal, ordering only matters between non_left v.s.
non_left and left_rec v.s. left_rec.
e.g.
X <- X Y 'a'
/ X 'a' Y
/ 'xa' Y
/ 'x' Y 'a'
Y <- [ab]
Restrictions on the grammars:
For direct left-recursion, the non-terminal has to have a
non-left-recursive part.
For indirect, at least one of the non-terminals has to have a
non-left-recursive part.
The non-left-recursive part has to be able to terminate.
Each alternative of the left-recursive part must consume at least one char.
e.g. The following are nonsense.
X <- X / 'x'
Y <- Y & 'y' / 'y'
W <- W ! 'w' / 'w'
V <- V / V 'v' / 'v'
It's rather late on my side of the world so I'll stop now.
I've got pages of notes here and it's likely that I've forgotten to
mention some relevant details.
Anyway,
linear left recursive = Yes
direct and indirect = Yes.
Orlando Hill.
p.s. There are a few other things I should probably talk about over the
next few days to avoid something like this from happening again.
_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg