Re: Questions about error handling in Earley parsers

Jeffrey Kegler Fri, 10 Mar 2017 17:20:03 -0800

I'm now looking at the Stolcke article
<http://www.aclweb.org/anthology/J95-2002>.  I am sorry to confess that I
was not aware of it before.


It's not relevant to a probabilistic approach, but I think you might find
your error detection grammar fits Ruby Slippers parsing rather well.
Define an <error> token, one which is never found in actual input.  When
the parser is unable to accept the current input, on a Ruby Slippers basis,
feed it an <error> token.  This will detect errors, and minimize them if
proceeding left-to-right minimizes.

It may happen that a strict left-to-right minimization does not produce the
fewest <error>'s.  This would be in cases were the parser can accept the
current input, but it's a "garden path", and the parser would be better off
recognizing an error instead -- one <error> now will save 2 or more later.
Here, with Marpa, going to an ASF-based strategy might be better.

After looking at Stolcke, I may have more to say.

On Fri, Mar 10, 2017 at 12:24 PM, Jeffrey Kegler <
[email protected]> wrote:

> Thanks for your interest.  One approach might be to do a raw,
> non-probabilistic parse in Marpa, and then use its Abstract Syntax Forests
> <http://search.cpan.org/~jkegl/Marpa-R2-3.000000/pod/ASF.pod> to do the
> probabilistic analysis.
>
> I'm about to run errands, but I've printed your email out to take with
> me.  I'll let you know if I have other thoughts.
>
> -- jeffrey
>
> On Fri, Mar 10, 2017 at 12:15 PM, Maarten <[email protected]>
> wrote:
>
>> With great interest I have read the paper on the Marpa parser. This
>>> summer I have created my own Earley parser, a probabilistic one
>>> <https://github.com/digitalheir/java-probabilistic-earley-parser/>, and
>>> I am looking to implement some of Marpa's innovations in my own.
>>>
>>> Right now I am working on error handling. One person suggested I
>>> implement synchronizing rules like in shift-reduce parsers
>>> <https://github.com/digitalheir/java-probabilistic-earley-parser/issues/5#issuecomment-285617145>.
>>> I figured panic mode should behave a bit differently for probabilistic
>>> ambiguous grammars, so I settled on the following (text taken from issue on
>>> Github):
>>>
>>> Consider the following grammar:
>>>
>>> S -> AB C; (0.5)
>>> S -> A B ; (0.5)
>>>
>>> A -> a a ; (0.5)
>>> B -> b b ; (0.5)
>>> C -> c c ; (0.5)
>>>
>>> AB -> a a ; b b ; (0.5)
>>>
>>> A -> a <error> ; (0.5)
>>> B -> b <error> ; (0.5)
>>> AB -> a <error> ; (0.5)
>>>
>>> On the input a a ; b z ; c c ; with goal S. When encountering z, the
>>> parser could apply both B -> b <error> ; and AB -> a <error> ;. We
>>> would like to consider AB because that's the only way to make S. So we
>>> need to consider both error rules.
>>>
>>> Now consider the same grammar, except the first rule is S -> AB; on the
>>> same input. We have established that we want to consider both error rules,
>>> and in this case both make a valid parse. However, we want the probability
>>> of AB -> a <error> ; being applied to be lower than that of B -> b
>>> <error> ;, because we want to keep the number of ignored tokens minimal.
>>>
>>> That's why I have decided to make the error rule probability multiply
>>> every time a token is ignored. The inner score of applying B -> <error>
>>> ; is then be 0.5^2, and the inner score of AB -> <error> ; is 0.5^4.
>>>
>>>
>>> I suspect that you have more sophisticated knowledge than I do about
>>> Earley parsing, so I seek out your opinion: do you think this is a good
>>> idea? It also makes me a bit uncomfortable that above grammar fails
>>> terminally on a a ; b z ; c c ; cc ; , because the parser gets out of
>>> panic mode after encountering a semicolon. But maybe that should be a
>>> problem for another error handling strategy.
>>>
>>> Which bring me to the Ruby slippers error handling. I don't know if I
>>> have a good idea of how this works. The way I interpret is as follows: the
>>> parser might stall on input which prohibits it from succeeding (no states
>>> follow from predict, no states follow from scan or there are no states to
>>> complete). Because the chart will always contain some active states, you
>>> know what the parser expects to succeed and you could let the user decide
>>> whether to manipulate the input stream so that the parser may proceed given
>>> this information (eg, drop token, insert a token, etc). Is that correct?
>>>
>>> Thanks for your help!
>>>
>>> Maarten Trompper
>>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "marpa parser" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Questions about error handling in Earley parsers

Reply via email to