Leo reduction for a cyclic grammar

Timothy Allen Tue, 01 Jan 2019 12:13:07 -0800

Thanks, Jeffrey, for answering my question about empty rules. I haven't
decided if I'll eliminate them the same way Marpa does, but now I feel
much more confident that I can do it if necessary.


Right now, I'm studying Leo reduction, via Loup Vaillant's tutorial and
also a PEG-and-Earley tutorial somebody pointed me to. I haven't gotten
to actual parsing yet, I'm still trying to nail down reduction, but I'm
sure I'll get there eventually.

Consider the following grammar, whose starting symbol is <A>:

<A> ::= "x" | <B>
<B> ::= <A>

When fed the input "x", an Earley recogniser without right-reduction
would wind up with these items in the final set:

<A> ::= "x" · (0)
<A> ::= <B> · (0)
<B> ::= <A> · (0)

...and sure enough, if I don't attempt Leo reduction, that's what I
get.

If I do attempt Leo reduction, I have a problem. I'm looking for the
"topmost" reduction, but because the grammar has a cycle, there's no
actual top. This cycle isn't fatal; since I'm memoising this reduction,
I can drop a sentinel in the cache before I recurse, and if I find a
sentinel I know I'm done. However, this means that the "topmost" item
will be determined by the order in which I iterate over Earley items to
perform completion steps; whichever one I grab first will become the
topmost.

If I happen to pick <A> ::= <B> · (0) as my topmost, it will survive
into the final Earley set, and I will have two items for the start
symbol spanning the entire input, and therefore I can build two parse
trees: one for 0 iterations of the cycle, and one for 1 iteration. That
seems tidy.

On the other hand, if I happen to pick <B> ::= <A> · (0) as my topmost,
the final Earley set will only have one item for the start symbol
spanning the entire input, and therefore I can build only one parse
tree: the one for 0 iterations of the cycle.

I'd rather not have a recogniser that behaves inconsistently; Earley
parsing is supposed to be the "Just Works™" algorithm, after all. I can
think of a few alternatives:

 * I could declare that cyclic grammars are Bad, and automatically
reject them. Seems a bit heavy-handed.
 * I could declare that cyclic grammars are Unsupported, and allow them
but make no guarantees about finding all the parse trees. I notice that
libmarpa detects cyclic grammars and warns about them, but I can't find
any documentation saying cyclic grammars are any less reliable than
proper ones, just that they're less efficient.
 * If there were some principled way to choose a "topmost" item, I
could ensure I always picked the "right" one, and then I would be able
to make guarantees like "only the simplest parse tree is found" or
"only the simplest parse tree plus one iteration of the cycle".

What should I do?

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Leo reduction for a cyclic grammar

Reply via email to