Thanks, Jeffrey, for answering my question about empty rules. I haven't decided if I'll eliminate them the same way Marpa does, but now I feel much more confident that I can do it if necessary.
Right now, I'm studying Leo reduction, via Loup Vaillant's tutorial and also a PEG-and-Earley tutorial somebody pointed me to. I haven't gotten to actual parsing yet, I'm still trying to nail down reduction, but I'm sure I'll get there eventually. Consider the following grammar, whose starting symbol is <A>: <A> ::= "x" | <B> <B> ::= <A> When fed the input "x", an Earley recogniser without right-reduction would wind up with these items in the final set: <A> ::= "x" · (0) <A> ::= <B> · (0) <B> ::= <A> · (0) ...and sure enough, if I don't attempt Leo reduction, that's what I get. If I do attempt Leo reduction, I have a problem. I'm looking for the "topmost" reduction, but because the grammar has a cycle, there's no actual top. This cycle isn't fatal; since I'm memoising this reduction, I can drop a sentinel in the cache before I recurse, and if I find a sentinel I know I'm done. However, this means that the "topmost" item will be determined by the order in which I iterate over Earley items to perform completion steps; whichever one I grab first will become the topmost. If I happen to pick <A> ::= <B> · (0) as my topmost, it will survive into the final Earley set, and I will have two items for the start symbol spanning the entire input, and therefore I can build two parse trees: one for 0 iterations of the cycle, and one for 1 iteration. That seems tidy. On the other hand, if I happen to pick <B> ::= <A> · (0) as my topmost, the final Earley set will only have one item for the start symbol spanning the entire input, and therefore I can build only one parse tree: the one for 0 iterations of the cycle. I'd rather not have a recogniser that behaves inconsistently; Earley parsing is supposed to be the "Just Works™" algorithm, after all. I can think of a few alternatives: * I could declare that cyclic grammars are Bad, and automatically reject them. Seems a bit heavy-handed. * I could declare that cyclic grammars are Unsupported, and allow them but make no guarantees about finding all the parse trees. I notice that libmarpa detects cyclic grammars and warns about them, but I can't find any documentation saying cyclic grammars are any less reliable than proper ones, just that they're less efficient. * If there were some principled way to choose a "topmost" item, I could ensure I always picked the "right" one, and then I would be able to make guarantees like "only the simplest parse tree is found" or "only the simplest parse tree plus one iteration of the cycle". What should I do? -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
