Questions about error handling in Earley parsers

Maarten Fri, 10 Mar 2017 12:17:31 -0800

With great interest I have read the paper on the Marpa parser. This summer
> I have created my own Earley parser, a probabilistic one
> <https://github.com/digitalheir/java-probabilistic-earley-parser/>, and I
> am looking to implement some of Marpa's innovations in my own.
>
> Right now I am working on error handling. One person suggested I
> implement synchronizing rules like in shift-reduce parsers
> <https://github.com/digitalheir/java-probabilistic-earley-parser/issues/5#issuecomment-285617145>.
> I figured panic mode should behave a bit differently for probabilistic
> ambiguous grammars, so I settled on the following (text taken from issue on
> Github):
>
> Consider the following grammar:
>
> S -> AB C; (0.5)
> S -> A B ; (0.5)
>
> A -> a a ; (0.5)
> B -> b b ; (0.5)
> C -> c c ; (0.5)
>
> AB -> a a ; b b ; (0.5)
>
> A -> a <error> ; (0.5)
> B -> b <error> ; (0.5)
> AB -> a <error> ; (0.5)
>
> On the input a a ; b z ; c c ; with goal S. When encountering z, the
> parser could apply both B -> b <error> ; and AB -> a <error> ;. We would
> like to consider AB because that's the only way to make S. So we need to
> consider both error rules.
>
> Now consider the same grammar, except the first rule is S -> AB; on the
> same input. We have established that we want to consider both error rules,
> and in this case both make a valid parse. However, we want the probability
> of AB -> a <error> ; being applied to be lower than that of B -> b
> <error> ;, because we want to keep the number of ignored tokens minimal.
>
> That's why I have decided to make the error rule probability multiply
> every time a token is ignored. The inner score of applying B -> <error> ; is
> then be 0.5^2, and the inner score of AB -> <error> ; is 0.5^4.
>
>
> I suspect that you have more sophisticated knowledge than I do about
> Earley parsing, so I seek out your opinion: do you think this is a good
> idea? It also makes me a bit uncomfortable that above grammar fails
> terminally on a a ; b z ; c c ; cc ; , because the parser gets out of
> panic mode after encountering a semicolon. But maybe that should be a
> problem for another error handling strategy.
>
> Which bring me to the Ruby slippers error handling. I don't know if I have
> a good idea of how this works. The way I interpret is as follows: the
> parser might stall on input which prohibits it from succeeding (no states
> follow from predict, no states follow from scan or there are no states to
> complete). Because the chart will always contain some active states, you
> know what the parser expects to succeed and you could let the user decide
> whether to manipulate the input stream so that the parser may proceed given
> this information (eg, drop token, insert a token, etc). Is that correct?
>
> Thanks for your help!
>
> Maarten Trompper
>


-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Questions about error handling in Earley parsers

Reply via email to