I've come back to Perl after a long absence just to play with Marpa because 
it looks like the most full featured Earley parser in any of the 
programming languages I know.

I'm interested in Earley specifically because it can handle ambiguity and 
can produce a parse forest.

I'm using it to investigate the syllable structure of the writing system of 
the Lao language of Southeast Asia. Specifically to see whether it's 
inherently ambiguous, and how.

So far it works great and I'm glad I've come here from the Bison and PEG 
grammars I was playing with earlier.

But it seems that there might be two kinds of ambiguities, the kind I'm 
looking for, and a kind that might be an artefact of Earley parsing or of 
the way I've written the grammar.

Without having to teach you Lao I'll attempt to analogize:

All ::= Syllable+

Syllable ::= C V C
         | C V
         | C

C ~ [bcdfghjklmnpqrstvwxyz]
V ~ [aeiou]


The "Syllable ::= C" rule is to allow lone initial consonants, as are used 
occasionally for abbreviations.

If my input string is "mat" I only want:

(Syllable (C m) (V a) (C t))

But due to the abbreviation rule I also get a second unwanted parse:

(Syllable (C m) (V a))
(Syllable (C t))

I've been able to refactor my grammar to deal with other issues that have 
appeared, by I can't seem to think of anything which accounts for 
occasional abbreviations but doesn't generate a number of unwanted 
alternative parses.

Can I refactor my grammar or is there some other way to deal with this but 
still generate all the other kinds of ambiguity that I am interested in?

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to