In the post repeatedly re-proven statements 
<https://groups.google.com/g/metamath/c/IG1OPO0fkj0/m/4JXoI9uoBwAJ> David 
A. Wheeler found statements which appear in different proofs again and 
again. I decided to search for "repeatedly re-proven lemmata", i.e. 
sequences of theorem labels which often appear as sub-proofs in other 
theorems.

The rough algorithm is as follows. Say, we want to find lemmata with k 
steps. We take a compressed proof of some theorem T, and check all k-step 
subsequences L of its proof. We count a subsequence valid if

1) All its steps reference theorem labels (not a hypothesis and not a step 
marked with Z);
2) All its steps correspond to |- statements (to avoid counting syntax 
builder subproofs as lemmata);
3) L leaves exactly one theorem on the proof stack provided it is supplied 
with enough hypotheses (i.e. L is indeed a subproof, so subsequences like 
"0re, 0re" are not counted);
4) At least one of the theorems used in the subsequence L is used exactly 
once in the proof of T (otherwise minimizing with L would append an 
additional theorem label to the compressed proof of T, and likely that 
wouldn't minimize the proof).

Next we find all valid subsequences, and give each one a score by the 
formula (length - 1)*(number of theorems where it occurs). This gives an 
approximation to the number of saved bytes.

I did this search in set.mm for subsequences of lengths from 2 to 7, took 
top 20 of each length and combined into the following list:

https://gist.github.com/savask/13649ae9f1114544dd3d79f3ba01622f

The first component of each tuple is the subsequence and the second is its 
score.

For example, the first line is (["fvex","eqeltri"],791). By 
reverse-engineering the associated statement in mmj2 we get a reasonable 
theorem

${
  test1.1 $e |- A = ( F ` B ) $.
  test1 $p |- A e. _V $=
    ( cfv cvv fvex eqeltri ) ABCEFDBCGH $.
$}

If we take the line (["0re","1re","elicc2i"],88) we get

${
  test2 $p |- ( C e. ( 0 [,] 1 ) <-> ( C e. RR /\ 0 <_ C /\ C <_ 1 ) ) $=
    ( cc0 c1 0re 1re elicc2i ) BCADEF $.
$}

This is a version of elicc2i specialized to the unit interval. Apparently 
it is used quite often.

There are some mysterious proof subsequences in set.mm, for example 
consider (["toponunii","restid","ax-mp","eqcomi"],93). When expanded it 
gives

${
  test3.1 $e |- A e. C $.
  test3.2 $e |- A e. ( TopOn ` B ) $.
  test3 $p |- A = ( A |`t B ) $=
    ( crest co wcel wceq toponunii restid ax-mp eqcomi ) 
ABFGZAACHNAIDACBBAEJKLM $.
$}

This looks like a theorem with a completely useless hypothesis test3.1, but 
believe me or not, it does appear in some proofs, for example, in cncfcn1 
(one can shorten its proof from 172 to 116 bytes using test3). If we 
correct the statement a little bit, we get a saner theorem

${
  test4.1 $e |- A e. ( TopOn ` B ) $.
  test4 $p |- A = ( A |`t B ) $=
  ( crest co ctopon wcel wceq toponunii restid ax-mp eqcomi cfv ) 
ABDEZAABFMZGNAHCAOBBACIJKL $.
$}

Which also minimizes the proof of cncfcn1 but this time to 98 bytes.

There are 120 proof sequences in the list I linked above, and surely not 
all of them deserve to be set.mm theorems (although many might indeed save 
byte usage, though I haven't been able to run a full minimize run with 
theorems test1-test4 as it's too resource intensive for my computer). I'm 
not an active set.mm contributor, so I don't know which theorems look 
reasonable, how to name them, where to put them in set.mm etc. If anyone is 
interested in this, I encourage them to take any theorem they like from the
list (including test1-test4) and add them into set.mm - no reference to me 
or the list is needed. 

I would be glad to hear your opinions and suggestions.

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/9cfdac9b-4495-47a3-862b-47953a7e56b3n%40googlegroups.com.

Reply via email to