Hi Antony, That's a nice work. @savask did some similar things, see https://github.com/metamath/set.mm/issues/2128. Apparently his search is not exhaustive, so maybe you can see together how your methods differ and how to improve on one another. As for the big list, it would be interesting to have it. In another issue (https://github.com/metamath/set.mm/issues/2710), @savask linked to a file he posted on gist.github.com. I don't know how easy it would be to do it for you.
Note: you wrote on the first line of your first post "proofs that are identical", but the proofs can differ: what you mean is that the statements are identical (more precisely, as you wrote later: the $f, $d, $e and $p statements of the associated frame). And probably not up to variable renaming ? BenoƮt On Monday, August 29, 2022 at 12:24:49 PM UTC+2 [email protected] wrote: > On Sun, Aug 28, 2022 at 5:17 PM David A. Wheeler <[email protected]> > wrote: > > This is really interesting!! Thanks for doing this! > > Thanks! I'm thrilled you think so. > > > I suspect the tool will still find things with some heuristics, but > its results will then be focused on unintentional duplications. > > Cool. I was investigating where this is done intentionally, but it's good > if it can also be used for spotting unintentional duplications. > > I've done the filtering on labels that you asked for (in a single function > called filterAssertion, so it can easily be removed or refined) > > > 1. Omit names with ALT or OLD that appear to duplicate one without the > marking; we already know they're alternatives. > > After reviewing common suffices I actually went with this list: ['ALT', > 'ALT2', 'ALT3', 'OLD', 'ALTN', 'OLDN', 'VD'] > > > 3. Omit names that only differ because of a hyphen (ax-1, ax1). > > And for what it's worth I'll send you (off-list) the 218 lines of output > (down from 661) that we now get, but I think the filters by comment content > that you propose will actually deliver the most value. Being a > (repurposable) validator, checkmm doesn't currently have the capability to > associate the correct comment with an assertion. I think I'd prefer to put > this functionality in in the context of html generation (where it's easy to > test), then circle back around to alt-mm results filtering. > > Best regards, > > Antony > > >> >> >> > On Aug 28, 2022, at 7:19 AM, Antony Bartlett <[email protected]> wrote: >> > >> > I've written a short program (https://github.com/Antony74/alt-mm) that >> lists out assertions (axioms, definitions, and proofs) that are identical >> (and therefore interchangeable). >> >> This is really interesting!! Thanks for doing this! >> >> ... >> > When I run it on set.mm I get 661 lines of output. I'll reproduce the >> first 20 here. It's impossible to know if I've got this completely >> correct, but the output is very plausible, grouping similarly named labels >> suffixed with ALT or OLD. >> > >> > Each line contains the labels representing a group of repeated >> assertions, ordered by the first appearance >> > a1ii, dummylink >> > ax-mp, nic-stdmp, re1axmp, e0a >> > ax-1, minimp-ax1, luklem5, ax1, tbw-ax2, retbwax2, re1tbw2, tb-ax2, >> wl-ax1, ax-frege1 >> > ax-2, minimp-ax2, ax2, re1ax2, wl-ax2, ax-frege2 >> > ax-3, con4, ax3, wl-ax3, ax3h >> > mp2, e00 >> > a1i, wl-a1i, wl-impchain-a1-1 >> > imim2i, wl-imim2i >> > id, idALT, id1, wl-id, frege27 >> > idd, frege26 >> >> Interesting results. I took a look at a few of these. >> >> Sadly, the few cases I reviewed seem to be intentional: >> * ax1 is a proof of axiom ax-1 starting from the *Lukasiewicz* axioms, >> showing >> that our chosen axiom system is at least as powerful as the >> *Lukasiewicz* axioms >> (in the sense of being able to prove at least as much). >> There's a naming convention here: We generally know that >> names-with-hyphens >> can be duplicated by same-name-drop-last-hypen, e.g., ax-1 <-> ax1. >> Also, note that ax1 is marked >> "(Proof modification is discouraged.)" and "(New usage is >> discouraged.)", suggesting >> that this is a pretty special thing. >> * idALT is an alternative proof of id. You can tell because its name adds >> "ALT" to another name. >> * frege27 is specifically *commented* as being the same as id : >> "Identical to ~ id". It's also marked as "(Proof modification is >> discouraged.)" - suggesting >> that something weird/special is going on. >> * con4 notes in its comments that it's an "Alias for ~ ax-3 to be used >> instead of it for labeling consistency." >> So its comment specifically notes the duplicate. You could argue that >> this duplication >> isn't helpful (why not just consistently use the name of the axiom >> directly?), but >> it certainly isn't hidden. >> >> I suggest adding a few heuristics to omit the *intentional* duplicates, >> so we >> can see just the *unintentional* duplicates. I suggest these heuristics: >> 1. Omit names with ALT or OLD that appear to duplicate one without the >> marking; >> we already know they're alternatives. >> 2. Omit names with comments that refer to the other duplicate proof, >> e.g., "~ id" in frege27. >> Obviously we *knew* about the duplications in those cases, since >> they're expressly noted >> in the comments. >> 3. Omit names that only different because of a hyphen (ax-1, ax1). >> 4. MAYBE: Omit names with comments "(Proof modification is discouraged.)" >> or >> "(New usage is discouraged.)"; these are already highly specialized. >> This specific heuristic might omit too much, so I suggest >> experimenting. >> Maybe just omit those that say "(Proof modification is discouraged.)", >> since that >> says we don't want to replace the proof during minimization? >> Or maybe the other heuristics are good enough. >> >> >> What do you think? >> >> > It uses the checkmm validator to parse the .mm file. Kind of wish I'd >> asked here first because it's slightly more involved than I'd anticipated >> to check if two assertions are the same. It's not just the expressions >> that have to match, it's any hypothesis expressions (unordered), and any >> disjoint variable conditions (unordered). What I realised but haven't >> explored is that an assertion lacking a particular hypothesis or disjoint >> variable condition is presumably still compatible, but only in one >> direction. >> >> That would be interesting! But I think you'd have even more "false >> positives" >> in the sense of looking for unintentional duplicates, so I think it'd be >> even more >> important to filter out some cases. >> >> I think there's a kernel of a good idea here, but it needs some >> refining/tweaking >> to keep the output from being overwhelming. >> >> --- David A. Wheeler >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Metamath" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/metamath/B7C2232E-23AF-4B0D-98D9-43481BCB60F8%40dwheeler.com >> . >> > -- You received this message because you are subscribed to the Google Groups "Metamath" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/135aec31-2a5b-4c73-ab4b-22df1ec25935n%40googlegroups.com.
