I've come back to Perl after a long absence just to play with Marpa because
it looks like the most full featured Earley parser in any of the
programming languages I know.
I'm interested in Earley specifically because it can handle ambiguity and
can produce a parse forest.
I'm using it to investigate the syllable structure of the writing system of
the Lao language of Southeast Asia. Specifically to see whether it's
inherently ambiguous, and how.
So far it works great and I'm glad I've come here from the Bison and PEG
grammars I was playing with earlier.
But it seems that there might be two kinds of ambiguities, the kind I'm
looking for, and a kind that might be an artefact of Earley parsing or of
the way I've written the grammar.
Without having to teach you Lao I'll attempt to analogize:
All ::= Syllable+
Syllable ::= C V C
| C V
| C
C ~ [bcdfghjklmnpqrstvwxyz]
V ~ [aeiou]
The "Syllable ::= C" rule is to allow lone initial consonants, as are used
occasionally for abbreviations.
If my input string is "mat" I only want:
(Syllable (C m) (V a) (C t))
But due to the abbreviation rule I also get a second unwanted parse:
(Syllable (C m) (V a))
(Syllable (C t))
I've been able to refactor my grammar to deal with other issues that have
appeared, by I can't seem to think of anything which accounts for
occasional abbreviations but doesn't generate a number of unwanted
alternative parses.
Can I refactor my grammar or is there some other way to deal with this but
still generate all the other kinds of ambiguity that I am interested in?
--
You received this message because you are subscribed to the Google Groups
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.