On Fri, Nov 8, 2019 at 4:37 AM Benoit <[email protected]> wrote:
> Thanks Giovanni. I hope Mario can also chime in on the questions in my > previous message, since these are not right-or-wrong issues, but are more > about balancing pros (proximity with mathematical practice) and cons (more > complex parser). > I basically agree with Giovanni's answer, but I will give my own with attention to the points in your latest message. On Thu, Nov 7, 2019 at 1:32 PM Benoit <[email protected]> wrote: > But I still have a question: is this common practice in computer science? > It is certainly not in mathematics. It is actually a bit strange from a > mathematical-practice point of view. For instance, if I want to define an > R-module M where * is the field multiplication and . is the scalar > multiplication (so for instance, if a, b \in R and s \in M, I have (a * b) > . s = a . ( b . s ) \in M), I would have to type, e.g.: > infix * rassoc precedence=1; (or "lassoc") > infix . rassoc precedence=2; > and then one of the axioms for an algebra would read "a . b . s = a * b . > s". By the way: it is necessary to have different precedences, in this > example. Is this correct? > I have a few reactions to this example. First, it is pushing a bit too heavily on the parser. Regardless of whether the parser can take it, this is near the limit of what I would consider "good practice" regarding appropriate use of notations and associativity for parenthesis-reduction. As you say, the only way to get the parses you want are for "." to be right assoc, and "*" to be left or right assoc with a higher precedence. Anything else would cause the parser to produce the wrong parse, leading to a type error. (Just put parentheses!) But it seems strange to specify that an operator of signature "scalar x > vector -> vector" is either left- or right-associative. Since vector is > not a subtype of scalar, I should not have to specify that, and "a . b . s" > can only be read as "a . ( b . s )". If I had typed "infix . lassoc > precedence=2;" above, would this have triggered an error? (Another example > is inner products, since "scalar" is not a subtype of "vector".) Or would > this require a too complex parser? > Yes, you have to specify it is right associative, and if you don't then you will have to put in lots of parentheses to make type correct statements. It may seem strange that you have to redundantly specify this information, but it's not really redundant. How can MM0 know that scalar is not a subtype of vector? It may not be right now, but perhaps later you decide to add a coercion and then it becomes ambiguous. But the more important part comes back to why MM0 has a parser in the first place. From its inception, the parser was supposed to be some reasonable algorithm that guarantees that the underlying CFG is unambiguous. The MM0 parser specification is not deterministic, it is based on a CFG, so the constraints are necessary to ensure that any reasonable parsing algorithm will find the correct parse. The shunting-yard algorithm or what have you does *not* define the grammar, it is merely the implementation. This is why issues such as Giovanni brought up in the first post are actual problems for the functional correctness proof. The problem with using type information for parsing is that the user could just as easily have written ". : vector x vector -> vector" with everything else the same, and now the grammar is ambiguous. So the static checks must somehow exclude this case. Moreover, if you exclude bad parses by type information you can have some much more complicated edge cases. Suppose "@" and "%" have ambiguous fixity, but have the types @ : A x B -> C % : C x D -> B Then (a @ x % d) can either be parsed as ((a @ x) % d), where x: B and the expression has type B, or (a @ (x % d)) where x: C and the expression also has type C. If I throw in ^ : B x E -> B then (a @ x % d ^ e) has two valid parses: (((a @ x) % d) ^ e) (of type B, assuming x: B) or (a @ ((x % d) ^ e)) (of type C, assuming x: C). These examples seem very sketchy to me. It is conceivable that there is an algorithm that decide whether some set of operators (given their types and the coercion graph) is unambiguous or not, but I can not believe that it is at all a simple algorithm. It seems like you would have exponentially many paths to try because of all the local ambiguities. Finally, I want to point out that your example is slightly unrealistic. The MM0 type system is (somewhat deliberately) impoverished, and it would be very unlikely for you to actually have types "scalar" and "vector" unless you are in a (very limited) kind of first order proof environment for modules. You wouldn't be able to do much interesting vector space theory there, because you have no higher order constructs. Much more likely, you are working in a ZFC style foundation where everything has the same type "set", and then type information is next to useless for this kind of thing. In the example above, I would much rather have to type, e.g.: > infix * rassoc precedence=1; > infix . precedence=2; > and that if I had added "rassoc" or "lassoc" in the second line, this > would have triggered an error. > I will add that I fell into this (cognitive) trap myself, and Giovanni caught me on it. I said to myself "It doesn't make any sense for "e." to be right or left associative, because it has type set x set -> wff, so it can't associate with itself anyway, so I'll just make it left assoc". But this is not true, because "~" lives at the same precedence, and if "e." is left assoc then there is a conflict, while if "e." was right assoc then there would be no conflict. Because *types don't matter during parsing* (at least, with the MM0 parser algorithm as it currently exists). I've written a simple C compiler before, and the "A * b;" example of type-disambiguation between a multiplication statement and a pointer declaration is a real thing (and a real pain). I don't want to replicate that. > You write: > >> In you example, you definitely want "." to be right associative, so that >> a . b . s = a . (b . s) as you say. > > > Not only do I want it, but it is the only possible parse, for type > reasons. Would it have triggered an error to type " infix . lassoc prec=1 > " ? Or would it trigger an error only when I write "a . b . s" without > parentheses ? (In any case, this would be unfortunate.) > It would not be an error to type "infixl smul: $.$ prec 1;". It would be perfectly valid, but probably less convenient than you would like because you would have to add lots of parentheses to write things like "a . (b . s)". You could try to write "a . b . s" but it would be interpreted as "(a . b) . s" and reported as a type error. > You also want * to be higher >> precedence, because you probably want a * b . s = (a * b) . s. > > > Idem: it is the only possible parse for type reasons. Would it have > triggered an error to give higher precedence to the scalar multiplication > ? Or would it trigger an error only if I write "a * b . s" without > parentheses ? Here, I would be ok to have to always type "(a * b) . s" and > to be forbidden to type "a * b . s". In other words, is it possible to > define operation associativities and precedences which require parentheses > to avoid ambiguity, without triggering errors at operation definitions ? > Allowing this might indeed require a too complex parser, and this is > probably not MM0's job (which should probably allow minimal syntactic sugar > as long as this does not complexify things too much). > If you are okay with always writing "(a * b) . s", then * can have whatever precedence you want (unless you make * and . have the same precedence and opposite associativity). > Side remark: I agree with the treatment of prefixed operators, which looks > like the most sensible one. Is there a reason why suffixed operators are > not allowed in MM0 ? This sounds unfortunate since they are often used in > math (inverse, transpose, conjugate...). > I was actually thinking about changing this, and this thread in particular has clarified matters for me regarding the CFG static analysis enough that I think it could be done without complicating the algorithm. This came up for me with "mixfix" operators. With MM0 currently, you can define "a + b", but you can't write "a +g[ G ] b" representing a general group addition, because the +g notation does not begin with a constant. But it can be analyzed essentially like any other infix notation: it is marked either left or right assoc, and the inner variables are treated like they would be in a general notation. This could be declared using general notation as: notation gmul (a G b: set): set = a ($+g[$:50) G ($]$:0) b : lassoc 50; (The main thing I don't like about this proposal is that it adds new keywords like "lassoc" and such since "notation" is a little limited in the way it describes the precedence of the first and last variables.) This turns into the rule ((a:50) +g[ (G:1) ] (b:51) : 50) where the first variable stays at prec 50 because it is lassoc. For a rassoc notation it would be ((a:51) +g[ (G:1) ] (b:50) : 50) instead. Like other general notations, the first constant is required to be unique, but in this case that means the thing after the variable. Two variables are not allowed at the start, i.e. a notation "a b + c" is not okay, although "a + G b" is, where G gets parsed with max precedence (meaning that it will either be atomic or in parentheses). With a notation system that can support an initial variable like this, it is not difficult to leave off the trailing variable, resulting in a postfix notation such as ((a:50) ! : 50) which is left assoc. (You could also have a right assoc postfix notation, but you would never want to, because you could not nest it with itself, i.e. you have to write "(a !) !" instead of "a ! !".) Since it follows the same rules regarding uniqueness of associativity per precedence level, this doesn't introduce any new ambiguities, and the same precedence parser algorithm can handle general notations in infix position without backtracking because of the unique constant requirement. Another remark: if one retains, for simplicity of the parser (and > neglecting proximity with mathematical practice), that a given precedence > is associated with either left- or right-associative operations, one could > even impose from the beginning that even precedence implies > right-associativity and odd precedence implies left-associativity, hence > doing away with the keywords "lassoc" and "rassoc". > I suppose that's true, but it's not obvious. MM0 is designed to be read, and having mystery rules like that seems like a good way to scare people off. The precedence system is supposed to be minimal and expressive, but not surprising, and I can already see the angry github issues asking why changing a precedence from 42 to 43 with nothing else nearby breaks their code. (MM0 actually doesn't have keywords "lassoc" and "rassoc" btw, although the proposal above would add them. It uses "infixl" and "infixr" instead.) Regarding the barb about neglecting mathematical practice, I still don't see your argument. Your example with modules fits perfectly well within the MM0 precedence system. Yes you have to write those precedences down, and if you get it wrong then things don't typecheck, but that doesn't mean that "." isn't right associative in the example. It's *obviously* right associative because the other option doesn't typecheck, but you seem to be taking from this fact that it's not associative at all, which isn't true; a non-associative operator (which MM0 doesn't support) would *always* require parentheses. There are plenty of other issues you could raise with the MM0 notation system regarding the neglect of mathematical practice, in particular the massive overloading of operators and implicit parameters. MM0 takes a similar stance to metamath here, so it's probably not to surprising to the folks on this forum, but I've gotten comments on it elsewhere and I can only point to the added disambiguation complexity and lack of clarity that comes with those systems. Mario -- You received this message because you are subscribed to the Google Groups "Metamath" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/CAFXXJSsVtYnERngWyc4OEciZZPQVFK53Gb%2BuX_-5cu_0azxXmQ%40mail.gmail.com.
