Perhaps if we differentiate between "closed" and "open" spans -- "closed" ending in a consonant and "open" ending in a vowel.

All ::= Span
Span ::= Closed_Span | Open_Span
Closed_Span ::= Abbreviation
Closed_Span :: Closed_Syllable
Closed_Span ::= Span Closed_Syllable
Closed_Span ::= Closed_Span Abbreviation
Open_Span ::= Span Open_Syllable
Open_Span ::= Open_Syllable
Abbreviation ::= C
Closed_Syllable ::= C V C
Open_Syllable ::= C V

I'm a little busy now, so I didn't test this, but perhaps you get the idea -- the spans recurse, with open ones ending in a vowel and closed ones ending in a consonant. Abbreviations may only occur in two places -- following a closed span, or at the very beginning of a span. Several abbreviations are allowed in a row, since an abbreviation can end a closed span.

@rns: Does this work?  I *think* it's unambiguous.

-- jeffrey

On 09/03/2014 09:35 PM, Andrew Dunbar wrote:
Yes I'm vague on Data::Dumper and I don't know much about the workings of Marpa.

I added on the example code for Marpa::R2::ASF so I can compare it with my real code. It doesn't seem to be ambiguous now but I actually can't see what's different about it.
I'm not sure whether I simplified it too much when I made the analogy.

I want rules that mean "only interpret a consonant as an abbreviation" when it can't be interpreted as part of a syllable.

I don't know if that's possible of course (-:

I'll see if I can come up with a better analogy based on some actual ambiguities I find in Lao.

On Thursday, 4 September 2014 13:58:23 UTC+10, Jeffrey Kegler wrote:

    What rns did (as I read it) was list all the results of
    $slr->value(). The parse is unambiguous if and only if there is
    exactly one, which seems to be the case here.  (You've been away
    from Perl, so Data::Dumper may now be hard to read, but you can
    confirm this for yourself by adding a line before the dump of each
    value, as a "hi there", or giving a count.)

    Is your rule that you don't want to allow an abbreviation to
    follow a vowel?

    -- jeffrey

    On 09/03/2014 08:24 PM, Andrew Dunbar wrote:
    Do we know if that's ambiguous? Don't we have to run it
    through Marpa::R2::ASF to know?



    On Wednesday, 3 September 2014 20:10:42 UTC+10, rns wrote:

        Can you please look at this gist
        <https://gist.github.com/rns/fb6abf62a5fa779957ba>? The
        result is in the comment below it. This might be a solution
        provided that I've got the right idea.






        On Wed, Sep 3, 2014 at 11:44 AM, Andrew Dunbar
        <[email protected]> wrote:

            I've come back to Perl after a long absence just to play
            with Marpa because it looks like the most full featured
            Earley parser in any of the programming languages I know.

            I'm interested in Earley specifically because it can
            handle ambiguity and can produce a parse forest.

            I'm using it to investigate the syllable structure of the
            writing system of the Lao language of Southeast Asia.
            Specifically to see whether it's inherently ambiguous,
            and how.

            So far it works great and I'm glad I've come here from
            the Bison and PEG grammars I was playing with earlier.

            But it seems that there might be two kinds of
            ambiguities, the kind I'm looking for, and a kind that
            might be an artefact of Earley parsing or of the way I've
            written the grammar.

            Without having to teach you Lao I'll attempt to analogize:

            |
            All::=Syllable+

            Syllable::=C V C
             |C V
             |C

            C ~[bcdfghjklmnpqrstvwxyz]
            V ~[aeiou]

            |

            The "Syllable ::= C" rule is to allow lone initial
            consonants, as are used occasionally for abbreviations.

            If my input string is "mat" I only want:

            |
            (Syllable(C m)(V a)(C t))
            |

            But due to the abbreviation rule I also get a second
            unwanted parse:

            |
            (Syllable(C m)(V a))
            (Syllable(C t))
            |

            I've been able to refactor my grammar to deal with other
            issues that have appeared, by I can't seem to think of
            anything which accounts for occasional abbreviations but
            doesn't generate a number of unwanted alternative parses.

            Can I refactor my grammar or is there some other way to
            deal with this but still generate all the other kinds of
            ambiguity that I am interested in?
-- You received this message because you are subscribed to
            the Google Groups "marpa parser" group.
            To unsubscribe from this group and stop receiving emails
            from it, send an email to [email protected].
            For more options, visit
            https://groups.google.com/d/optout
            <https://groups.google.com/d/optout>.


-- You received this message because you are subscribed to the
    Google Groups "marpa parser" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected] <javascript:>.
    For more options, visit https://groups.google.com/d/optout
    <https://groups.google.com/d/optout>.

--
You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to