Re: Questions about error handling in Earley parsers

Jeffrey Kegler Fri, 10 Mar 2017 12:24:44 -0800

Thanks for your interest.  One approach might be to do a raw,
non-probabilistic parse in Marpa, and then use its Abstract Syntax Forests
<http://search.cpan.org/~jkegl/Marpa-R2-3.000000/pod/ASF.pod> to do the
probabilistic analysis.


I'm about to run errands, but I've printed your email out to take with me.
I'll let you know if I have other thoughts.

-- jeffrey

On Fri, Mar 10, 2017 at 12:15 PM, Maarten <[email protected]> wrote:

> With great interest I have read the paper on the Marpa parser. This summer
>> I have created my own Earley parser, a probabilistic one
>> <https://github.com/digitalheir/java-probabilistic-earley-parser/>, and
>> I am looking to implement some of Marpa's innovations in my own.
>>
>> Right now I am working on error handling. One person suggested I
>> implement synchronizing rules like in shift-reduce parsers
>> <https://github.com/digitalheir/java-probabilistic-earley-parser/issues/5#issuecomment-285617145>.
>> I figured panic mode should behave a bit differently for probabilistic
>> ambiguous grammars, so I settled on the following (text taken from issue on
>> Github):
>>
>> Consider the following grammar:
>>
>> S -> AB C; (0.5)
>> S -> A B ; (0.5)
>>
>> A -> a a ; (0.5)
>> B -> b b ; (0.5)
>> C -> c c ; (0.5)
>>
>> AB -> a a ; b b ; (0.5)
>>
>> A -> a <error> ; (0.5)
>> B -> b <error> ; (0.5)
>> AB -> a <error> ; (0.5)
>>
>> On the input a a ; b z ; c c ; with goal S. When encountering z, the
>> parser could apply both B -> b <error> ; and AB -> a <error> ;. We would
>> like to consider AB because that's the only way to make S. So we need to
>> consider both error rules.
>>
>> Now consider the same grammar, except the first rule is S -> AB; on the
>> same input. We have established that we want to consider both error rules,
>> and in this case both make a valid parse. However, we want the probability
>> of AB -> a <error> ; being applied to be lower than that of B -> b
>> <error> ;, because we want to keep the number of ignored tokens minimal.
>>
>> That's why I have decided to make the error rule probability multiply
>> every time a token is ignored. The inner score of applying B -> <error> ; is
>> then be 0.5^2, and the inner score of AB -> <error> ; is 0.5^4.
>>
>>
>> I suspect that you have more sophisticated knowledge than I do about
>> Earley parsing, so I seek out your opinion: do you think this is a good
>> idea? It also makes me a bit uncomfortable that above grammar fails
>> terminally on a a ; b z ; c c ; cc ; , because the parser gets out of
>> panic mode after encountering a semicolon. But maybe that should be a
>> problem for another error handling strategy.
>>
>> Which bring me to the Ruby slippers error handling. I don't know if I
>> have a good idea of how this works. The way I interpret is as follows: the
>> parser might stall on input which prohibits it from succeeding (no states
>> follow from predict, no states follow from scan or there are no states to
>> complete). Because the chart will always contain some active states, you
>> know what the parser expects to succeed and you could let the user decide
>> whether to manipulate the input stream so that the parser may proceed given
>> this information (eg, drop token, insert a token, etc). Is that correct?
>>
>> Thanks for your help!
>>
>> Maarten Trompper
>>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Questions about error handling in Earley parsers

Reply via email to