Thanks for your interest. One approach might be to do a raw, non-probabilistic parse in Marpa, and then use its Abstract Syntax Forests <http://search.cpan.org/~jkegl/Marpa-R2-3.000000/pod/ASF.pod> to do the probabilistic analysis.
I'm about to run errands, but I've printed your email out to take with me. I'll let you know if I have other thoughts. -- jeffrey On Fri, Mar 10, 2017 at 12:15 PM, Maarten <[email protected]> wrote: > With great interest I have read the paper on the Marpa parser. This summer >> I have created my own Earley parser, a probabilistic one >> <https://github.com/digitalheir/java-probabilistic-earley-parser/>, and >> I am looking to implement some of Marpa's innovations in my own. >> >> Right now I am working on error handling. One person suggested I >> implement synchronizing rules like in shift-reduce parsers >> <https://github.com/digitalheir/java-probabilistic-earley-parser/issues/5#issuecomment-285617145>. >> I figured panic mode should behave a bit differently for probabilistic >> ambiguous grammars, so I settled on the following (text taken from issue on >> Github): >> >> Consider the following grammar: >> >> S -> AB C; (0.5) >> S -> A B ; (0.5) >> >> A -> a a ; (0.5) >> B -> b b ; (0.5) >> C -> c c ; (0.5) >> >> AB -> a a ; b b ; (0.5) >> >> A -> a <error> ; (0.5) >> B -> b <error> ; (0.5) >> AB -> a <error> ; (0.5) >> >> On the input a a ; b z ; c c ; with goal S. When encountering z, the >> parser could apply both B -> b <error> ; and AB -> a <error> ;. We would >> like to consider AB because that's the only way to make S. So we need to >> consider both error rules. >> >> Now consider the same grammar, except the first rule is S -> AB; on the >> same input. We have established that we want to consider both error rules, >> and in this case both make a valid parse. However, we want the probability >> of AB -> a <error> ; being applied to be lower than that of B -> b >> <error> ;, because we want to keep the number of ignored tokens minimal. >> >> That's why I have decided to make the error rule probability multiply >> every time a token is ignored. The inner score of applying B -> <error> ; is >> then be 0.5^2, and the inner score of AB -> <error> ; is 0.5^4. >> >> >> I suspect that you have more sophisticated knowledge than I do about >> Earley parsing, so I seek out your opinion: do you think this is a good >> idea? It also makes me a bit uncomfortable that above grammar fails >> terminally on a a ; b z ; c c ; cc ; , because the parser gets out of >> panic mode after encountering a semicolon. But maybe that should be a >> problem for another error handling strategy. >> >> Which bring me to the Ruby slippers error handling. I don't know if I >> have a good idea of how this works. The way I interpret is as follows: the >> parser might stall on input which prohibits it from succeeding (no states >> follow from predict, no states follow from scan or there are no states to >> complete). Because the chart will always contain some active states, you >> know what the parser expects to succeed and you could let the user decide >> whether to manipulate the input stream so that the parser may proceed given >> this information (eg, drop token, insert a token, etc). Is that correct? >> >> Thanks for your help! >> >> Maarten Trompper >> > > > -- > You received this message because you are subscribed to the Google Groups > "marpa parser" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
