I'm now looking at the Stolcke article <http://www.aclweb.org/anthology/J95-2002>. I am sorry to confess that I was not aware of it before.
It's not relevant to a probabilistic approach, but I think you might find your error detection grammar fits Ruby Slippers parsing rather well. Define an <error> token, one which is never found in actual input. When the parser is unable to accept the current input, on a Ruby Slippers basis, feed it an <error> token. This will detect errors, and minimize them if proceeding left-to-right minimizes. It may happen that a strict left-to-right minimization does not produce the fewest <error>'s. This would be in cases were the parser can accept the current input, but it's a "garden path", and the parser would be better off recognizing an error instead -- one <error> now will save 2 or more later. Here, with Marpa, going to an ASF-based strategy might be better. After looking at Stolcke, I may have more to say. On Fri, Mar 10, 2017 at 12:24 PM, Jeffrey Kegler < [email protected]> wrote: > Thanks for your interest. One approach might be to do a raw, > non-probabilistic parse in Marpa, and then use its Abstract Syntax Forests > <http://search.cpan.org/~jkegl/Marpa-R2-3.000000/pod/ASF.pod> to do the > probabilistic analysis. > > I'm about to run errands, but I've printed your email out to take with > me. I'll let you know if I have other thoughts. > > -- jeffrey > > On Fri, Mar 10, 2017 at 12:15 PM, Maarten <[email protected]> > wrote: > >> With great interest I have read the paper on the Marpa parser. This >>> summer I have created my own Earley parser, a probabilistic one >>> <https://github.com/digitalheir/java-probabilistic-earley-parser/>, and >>> I am looking to implement some of Marpa's innovations in my own. >>> >>> Right now I am working on error handling. One person suggested I >>> implement synchronizing rules like in shift-reduce parsers >>> <https://github.com/digitalheir/java-probabilistic-earley-parser/issues/5#issuecomment-285617145>. >>> I figured panic mode should behave a bit differently for probabilistic >>> ambiguous grammars, so I settled on the following (text taken from issue on >>> Github): >>> >>> Consider the following grammar: >>> >>> S -> AB C; (0.5) >>> S -> A B ; (0.5) >>> >>> A -> a a ; (0.5) >>> B -> b b ; (0.5) >>> C -> c c ; (0.5) >>> >>> AB -> a a ; b b ; (0.5) >>> >>> A -> a <error> ; (0.5) >>> B -> b <error> ; (0.5) >>> AB -> a <error> ; (0.5) >>> >>> On the input a a ; b z ; c c ; with goal S. When encountering z, the >>> parser could apply both B -> b <error> ; and AB -> a <error> ;. We >>> would like to consider AB because that's the only way to make S. So we >>> need to consider both error rules. >>> >>> Now consider the same grammar, except the first rule is S -> AB; on the >>> same input. We have established that we want to consider both error rules, >>> and in this case both make a valid parse. However, we want the probability >>> of AB -> a <error> ; being applied to be lower than that of B -> b >>> <error> ;, because we want to keep the number of ignored tokens minimal. >>> >>> That's why I have decided to make the error rule probability multiply >>> every time a token is ignored. The inner score of applying B -> <error> >>> ; is then be 0.5^2, and the inner score of AB -> <error> ; is 0.5^4. >>> >>> >>> I suspect that you have more sophisticated knowledge than I do about >>> Earley parsing, so I seek out your opinion: do you think this is a good >>> idea? It also makes me a bit uncomfortable that above grammar fails >>> terminally on a a ; b z ; c c ; cc ; , because the parser gets out of >>> panic mode after encountering a semicolon. But maybe that should be a >>> problem for another error handling strategy. >>> >>> Which bring me to the Ruby slippers error handling. I don't know if I >>> have a good idea of how this works. The way I interpret is as follows: the >>> parser might stall on input which prohibits it from succeeding (no states >>> follow from predict, no states follow from scan or there are no states to >>> complete). Because the chart will always contain some active states, you >>> know what the parser expects to succeed and you could let the user decide >>> whether to manipulate the input stream so that the parser may proceed given >>> this information (eg, drop token, insert a token, etc). Is that correct? >>> >>> Thanks for your help! >>> >>> Maarten Trompper >>> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "marpa parser" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
