With great interest I have read the paper on the Marpa parser. This summer > I have created my own Earley parser, a probabilistic one > <https://github.com/digitalheir/java-probabilistic-earley-parser/>, and I > am looking to implement some of Marpa's innovations in my own. > > Right now I am working on error handling. One person suggested I > implement synchronizing rules like in shift-reduce parsers > <https://github.com/digitalheir/java-probabilistic-earley-parser/issues/5#issuecomment-285617145>. > I figured panic mode should behave a bit differently for probabilistic > ambiguous grammars, so I settled on the following (text taken from issue on > Github): > > Consider the following grammar: > > S -> AB C; (0.5) > S -> A B ; (0.5) > > A -> a a ; (0.5) > B -> b b ; (0.5) > C -> c c ; (0.5) > > AB -> a a ; b b ; (0.5) > > A -> a <error> ; (0.5) > B -> b <error> ; (0.5) > AB -> a <error> ; (0.5) > > On the input a a ; b z ; c c ; with goal S. When encountering z, the > parser could apply both B -> b <error> ; and AB -> a <error> ;. We would > like to consider AB because that's the only way to make S. So we need to > consider both error rules. > > Now consider the same grammar, except the first rule is S -> AB; on the > same input. We have established that we want to consider both error rules, > and in this case both make a valid parse. However, we want the probability > of AB -> a <error> ; being applied to be lower than that of B -> b > <error> ;, because we want to keep the number of ignored tokens minimal. > > That's why I have decided to make the error rule probability multiply > every time a token is ignored. The inner score of applying B -> <error> ; is > then be 0.5^2, and the inner score of AB -> <error> ; is 0.5^4. > > > I suspect that you have more sophisticated knowledge than I do about > Earley parsing, so I seek out your opinion: do you think this is a good > idea? It also makes me a bit uncomfortable that above grammar fails > terminally on a a ; b z ; c c ; cc ; , because the parser gets out of > panic mode after encountering a semicolon. But maybe that should be a > problem for another error handling strategy. > > Which bring me to the Ruby slippers error handling. I don't know if I have > a good idea of how this works. The way I interpret is as follows: the > parser might stall on input which prohibits it from succeeding (no states > follow from predict, no states follow from scan or there are no states to > complete). Because the chart will always contain some active states, you > know what the parser expects to succeed and you could let the user decide > whether to manipulate the input stream so that the parser may proceed given > this information (eg, drop token, insert a token, etc). Is that correct? > > Thanks for your help! > > Maarten Trompper >
-- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
