In the post repeatedly re-proven statements <https://groups.google.com/g/metamath/c/IG1OPO0fkj0/m/4JXoI9uoBwAJ> David A. Wheeler found statements which appear in different proofs again and again. I decided to search for "repeatedly re-proven lemmata", i.e. sequences of theorem labels which often appear as sub-proofs in other theorems.
The rough algorithm is as follows. Say, we want to find lemmata with k steps. We take a compressed proof of some theorem T, and check all k-step subsequences L of its proof. We count a subsequence valid if 1) All its steps reference theorem labels (not a hypothesis and not a step marked with Z); 2) All its steps correspond to |- statements (to avoid counting syntax builder subproofs as lemmata); 3) L leaves exactly one theorem on the proof stack provided it is supplied with enough hypotheses (i.e. L is indeed a subproof, so subsequences like "0re, 0re" are not counted); 4) At least one of the theorems used in the subsequence L is used exactly once in the proof of T (otherwise minimizing with L would append an additional theorem label to the compressed proof of T, and likely that wouldn't minimize the proof). Next we find all valid subsequences, and give each one a score by the formula (length - 1)*(number of theorems where it occurs). This gives an approximation to the number of saved bytes. I did this search in set.mm for subsequences of lengths from 2 to 7, took top 20 of each length and combined into the following list: https://gist.github.com/savask/13649ae9f1114544dd3d79f3ba01622f The first component of each tuple is the subsequence and the second is its score. For example, the first line is (["fvex","eqeltri"],791). By reverse-engineering the associated statement in mmj2 we get a reasonable theorem ${ test1.1 $e |- A = ( F ` B ) $. test1 $p |- A e. _V $= ( cfv cvv fvex eqeltri ) ABCEFDBCGH $. $} If we take the line (["0re","1re","elicc2i"],88) we get ${ test2 $p |- ( C e. ( 0 [,] 1 ) <-> ( C e. RR /\ 0 <_ C /\ C <_ 1 ) ) $= ( cc0 c1 0re 1re elicc2i ) BCADEF $. $} This is a version of elicc2i specialized to the unit interval. Apparently it is used quite often. There are some mysterious proof subsequences in set.mm, for example consider (["toponunii","restid","ax-mp","eqcomi"],93). When expanded it gives ${ test3.1 $e |- A e. C $. test3.2 $e |- A e. ( TopOn ` B ) $. test3 $p |- A = ( A |`t B ) $= ( crest co wcel wceq toponunii restid ax-mp eqcomi ) ABFGZAACHNAIDACBBAEJKLM $. $} This looks like a theorem with a completely useless hypothesis test3.1, but believe me or not, it does appear in some proofs, for example, in cncfcn1 (one can shorten its proof from 172 to 116 bytes using test3). If we correct the statement a little bit, we get a saner theorem ${ test4.1 $e |- A e. ( TopOn ` B ) $. test4 $p |- A = ( A |`t B ) $= ( crest co ctopon wcel wceq toponunii restid ax-mp eqcomi cfv ) ABDEZAABFMZGNAHCAOBBACIJKL $. $} Which also minimizes the proof of cncfcn1 but this time to 98 bytes. There are 120 proof sequences in the list I linked above, and surely not all of them deserve to be set.mm theorems (although many might indeed save byte usage, though I haven't been able to run a full minimize run with theorems test1-test4 as it's too resource intensive for my computer). I'm not an active set.mm contributor, so I don't know which theorems look reasonable, how to name them, where to put them in set.mm etc. If anyone is interested in this, I encourage them to take any theorem they like from the list (including test1-test4) and add them into set.mm - no reference to me or the list is needed. I would be glad to hear your opinions and suggestions. -- You received this message because you are subscribed to the Google Groups "Metamath" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/9cfdac9b-4495-47a3-862b-47953a7e56b3n%40googlegroups.com.
