Hi Mario,

Indeed set.mm's grammar is unambiguous, and as you write, these are not
true ambiguities, but just prefixes which my initial naive
implementation did not parse correctly. The concept of "garden-path" was
new to me, but describes well this problem: do you think I shall rename
the keyword?

Unfortunately the way I wrote the parser, these are not detected
naturally when building the grammar, and I don't see how they could. The
first pass is simply adds all syntactic axioms to the machine. Each
different construct is added independently, they don't interfere with
each other. In the example of my last mail, there are 3 different paths,
" /setvar/ *e.* /setvar/ ", " *(* /wff/ */\* /wff/ *)* ", and " *(*
/setvar/ *e.* /class/ *|->* /class/ *)* ". Even if the last two share "
*(* " as a common prefix, they differ afterwards because one is followed
by a /setvar/, while the other is followed by a /class/, so the state
machine goes to diverging states at that point.

I completely agree that it would have been better if the parser could
have figured them out purely based on the indication that the grammar is
KLR(5). The way I initially though about handling that was to let the
parser consider each possible combination of production against each
production, and see if there was a conflict. However this would have
been difficult to implement and computationally very intensive. Giving
those hints was the best compromise I could find. I'd be happy if you or
anyone else wants to have a closer look at the algorithm and propose
improvements! In any case, I'll try to write down detailed explanations
so that it is not only in emails.

You have a valid point about the risk of breakage when a new syntax is
introduced. But these prefix ambiguities (or "garden paths") arose in
set.mm only with basic constructs like class builders and maps-to
notations, which we don't introduce anymore, or extremely rarely:
df-opab was introduced in 1994, df-oprab in 1995, and df-mpt in 2008,
and never modified since. So the introduction of such garden paths is
once-in-a-decade event :-)

The meaning of the parser command " ambiguous_prefix X => Y " is "X
shadows Y", i.e., one may initially mistakenly read "X", although "Y"
might also be valid for the same prefix. "X" is always a prefix of a
valid set.mm expression, like " ( x e. A " is a prefix for a mapping-to
expression " ( x e. A |-> B ) ".

As for the syntax of the parser commands, they could also have been put
in quoted strings. The (MM) parser unquotes the string if it finds
quotes, and you can escape quotes, so that the unquoted ( is equivalent
with the quoted '('. For a semicolon, you could write ';', and '\'' for
a single quote. The way you write them also works.

BR,
_
Thierry


On 16/08/2021 12:44, Mario Carneiro wrote:
It sounds like the algorithm is quite similar to the LRParser
implementation (that I have called "KLR" before on the list). It is
almost an LR(0) parse, but then set.mm <http://set.mm> has certain
"ambiguities" (not really the right word, more like a garden path
sentence) that require the grammar to have some productions duplicated
in order to keep to the LR fragment. Currently, there is a $j comment
that says something like "unambiguous 'klr 5';" which is supposed to
indicate that compositing depth 5 is sufficient to eliminate all
ambiguities in the grammar (without this, parse table generation is
only a semi-decision procedure for unambiguity, there is no guarantee
of termination).

On Sun, Aug 15, 2021 at 12:00 PM Thierry Arnoux
<[email protected] <mailto:[email protected]>> wrote:

    Below are parser commands I needed to add to set.mm
    <http://set.mm>, in order to give hints to the parser about those
    ambiguities:

      $( Warn the parser about which particular formula prefixes are
    ambiguous $)
      $( $j ambiguous_prefix ( A   =>   ( ph ;
            type_conversions;
            ambiguous_prefix ( x e. A   =>   ( ph ;
            ambiguous_prefix { <.   =>   { A ;
            ambiguous_prefix { <. <.   =>   { A ;
      $)

    It might have been possible to detect them automatically, but I
    took the shortcut to specify them in set.mm <http://set.mm>. I'll
    kindly ask those lines to be included in the official set.mm
    <http://set.mm> so that the parser works!

What's the syntax of these commands? Meaning aside, I would request
that it stick to only keywords (identifiers) and quoted strings before
the semicolon. Arbitrary tokens will make a mess of things - what if
you need a token that is a semicolon? E.g.:

$( $j ambiguous_prefix '(' 'A' to   '(' 'ph' ;
        type_conversions;
        ambiguous_prefix '(' 'x' 'e.' 'A'   to   '(' 'ph' ;
        ambiguous_prefix '{' '<.'   to   '{' 'A' ;
        ambiguous_prefix '{' '<.' '<.'   to   '{' 'A' ;
  $)

 But I'm not sure how to read them beyond this. The parser I wrote
would just composite more whenever a conflict is detected during parse
table generation (and it can't help but detect the conflict since it
interferes with the generation), so it didn't need hints and can just
figure this out. All other things equal I think it would be better for
the parser to do this itself, since they will have to be kept up to
date with the database and most people won't be in a position to know
how to adjust the hints.

Mario
--
You received this message because you are subscribed to the Google
Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/metamath/CAFXXJSsVMwevtMAR8i0OB3zbni%3D7g3xho3u17soXpBKYnUw6Rg%40mail.gmail.com
<https://groups.google.com/d/msgid/metamath/CAFXXJSsVMwevtMAR8i0OB3zbni%3D7g3xho3u17soXpBKYnUw6Rg%40mail.gmail.com?utm_medium=email&utm_source=footer>.

--
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/b078f051-88d9-0637-4171-6a9d3947ffbe%40gmx.net.

Reply via email to