[Metamath] Re: My contribution to Metamath's mmsolitaire project

'Samiro Discher' via Metamath Sun, 09 Oct 2022 13:03:17 -0700

 Thanks :-)

Yes, the ones for the second proposal are ties, but I wouldn't say it's 
worth keeping them all since one could generate huge amounts of ties from 
what my tool can generate and what information such a proof – and the 
collection as a whole – contains. Except if you mean keeping only both, the 
historically first discovered and the alphanumerically minimal variants, 
since they are both "relevant" in a way.

Let me explain this and my approach a bit. (Sorry for the incoming wall of 
text. Feel free to skip at your pace. :-)

A parser can quickly extract all subproof strings, e.g. DD2D1DD22D11DD2D112 
(length 19) for theorem *1.5 has 7 non-trivial proper subproofs (i.e. 
excluding axioms and itself): D11, D22, D2D11, DD22D11, DD2D112, D1DD22D11, 
and D2D1DD22D11. Since every proof is essentially a modus ponens (two 
inputs and one output), a proof of length *n* ≥ 3 (they all have odd 
lengths) has up to (*n*-3)/2 non-trivial proper subproofs (there may be 
overlaps).

These numbers can grow quite large for longer proofs, and there may be many 
overlaps, e.g. my 1883-step proof of biass alone has 222 different 
non-trivial proper subproofs. When counting such, the unified collection 
(pmproofs-unified.txt) consists of 1374 proofs (which are all listed 
explicitly in data/pmproofs_1-unified.dpc), and FYI the non-unified 
collection consists of 1436 proofs (listed explicitly in 
data/pmproofs_1.dpc).
Now each such proof has a final formula (the conclusion), and many of those 
formulas have several shortest (known) proofs. There may even be cases 
where both a formula and a different schema of that formula are suitable 
with both having a minimal proof of the same length.

Now for a proof that has *m* uses of a certain conclusion where that 
conclusion has *n* shortest (known) proofs, there is a number of *m* to the 
power of *n* ways to only insert proofs of that conclusion into the final 
proof. So we are talking about exponential growth of possibilities based on 
the number of overlaps and redundant shortest (known) proofs. There is 
certainly quite a number of other 1883-step proofs of biass that could be 
deduced from the information that is already in pmproofs.txt. But not in 
pmproofs-unified.txt since the overlaps have been removed there.

Now for the overall approach, my parser creates the list of non-trivial 
subproofs merged from all proofs, sorts them by their lengths and parses 
them all to their conclusions.
Then I used files I generated which contain proofs of all possible 
conclusions of proofs up to length 29 to find better variants, using a lot 
of schema checks (my formulas are actually tree structures). If a 
conclusion *A* from my files is a schema of a conclusion *B* of 
pmproofs.txt, all string occurrences of proof(*B*) within pmproofs.txt can 
simply be replaced by proof(*A*), which keeps all the proofs valid and 
ideally shortens them.

So I collected all the best potential replacements, and then applied them 
from long to short.
Not too interesting I'd say, but rather straightforward (it took me one day 
when all the basic functionality was already implemented).
What really took its time and great effort was the very optimized 
implementation of the D-proof-to-formula (tree structure) parser and 
subtree checks, and the generator of all possible outcomes. For instance, I 
can parse pmproofs.txt in less than 170 ms (to what you see in 
data/pmproofs_2.dpc) using a single thread of my old Intel Core i7-3770, 
but I use multiple threads to compare and generate results.

The generator of all provable formulas is fully parallelized via Intel's 
oneTBB library. I essentially implemented a push-down automaton to generate 
all possible D-proofs of length *n*+2 based on when all of length *n* are 
already known, then do this step by step (i.e. file by file). But I sorted 
out all of the redundancies, including proofs with a conclusion for which 
there is a schema that has a known proof at least as short. So the 
remaining proofs have a conclusion that represents a whole bunch of other 
proofs and potentially other formulas. In short, I kept the amount of data 
as small as possible.
Here is the amount of new representative proofs of lengths up to 29:
1 : 3
3 : 6
5 : 12
7 : 38
9 : 89
11 : 229
13 : 672
15 : 1844
17 : 5221
19 : 15275
21 : 44206
23 : 129885
25 : 385789
27 : 1149058
29 : 3449251
(5181578 representatives in total)

For example, dProofs29.txt is already 103.477529 megabytes (that is 
3449251*(29+1)-1 bytes) in size, and all the parsed data from those files, 
including the formula trees of all of the conclusions, just barely fits 
into my 32 GiB of DDR memory. What takes a lot of time currently are the 
schema checks of which there are too many (quadratic complexity; pairwise 
comparisons only limited by fitting formula lengths), and I am still hoping 
to be able to come up with a database structure to make this more 
efficient. (For the 27->29 step, everything else took just under an hour on 
my PC, but the schema filtering took over 17 days.) The gain of schema 
filtering is little but still has an exponential effect on subsequent 
files. (926015 proofs were filtered out from the proofs of length 29 in 
that manner.) Anyway, a supercomputer could probably produce more shortest 
results using my HPC-ready program, and I will try to get access to one.

I hope this is somewhat readable, if something is still unclear, feel free 
to ask. And let's hope at least David is still alive .. (´ ∀ ` *)

— Samiro Discher

Benoit schrieb am Sonntag, 9. Oktober 2022 um 14:50:47 UTC+2:

> Congratulations !
>
> Since that page of the website is not on git, I think only David can add 
> your proposals.  I checked both using drule.c.
>
> As for your second proposal, that happened because you found "ties" among 
> shortest known proofs, is that it ?  Maybe it's worth keeping them all ?
>
> I'm curious to see that C++ tool when you publicly release it.  I don't 
> know when the number of "21 steps" in pmproofs.txt for exhaustive search 
> was written, but being able to reach 29 steps, given the exponential nature 
> of the problem, seems impressive.  And also for the non-exhaustive part, 
> the heuristics you're using are probably interesting.
>
> Benoît
>
>
> On Thursday, October 6, 2022 at 9:50:28 PM UTC+2 Samiro Discher wrote:
>
>> I'd like to contribute my findings to 
>> https://us.metamath.org/mmsolitaire/pmproofs.txt.
>>
>> I found several (17) shorter proofs. See 'pmproofs.txt' at 
>> https://github.com/xamidi/mmsolitaire [changes: 2be2349 
>> <https://github.com/xamidi/mmsolitaire/commit/2be2349a5621eaaee5d5940e98202bd265c9d50a>
>> ]:
>> *2.62:      89 ->   77
>> *3.31:     103 ->   83
>> *3.43:     137 ->  117
>> *3.44:     203 ->  183
>> *4.14:     321 ->  283
>> *4.32:     355 ->  317
>> *4.39:     609 ->  479
>> *4.41:     271 ->  249
>> *4.76:     271 ->  249
>> *4.77:     375 ->  287
>> *4.86:     617 ->  557
>> *4.87:     523 ->  447
>> *5.31:     119 ->  109
>> *5.6:      203 ->  167
>> *5.75:     387 ->  351
>> meredith:  145 ->  141
>> biass:    2127 -> 1883
>>
>> Furthermore, I modified another 26 proofs such that all subproofs of the 
>> same formula (ranging over the whole collection) are equal (preferring the 
>> lexicographically 
>> smaller proof strings). See 'pmproofs-unified.txt'. Feel free to use its 
>> contents if desired. I find that property useful since this way, there are 
>> no redundant proofs when parsing the collection, which saves time and space 
>> and determines ideal and unique references. To showcase this, I uploaded 
>> example files of the parsed collections to 
>> https://github.com/xamidi/mmsolitaire/tree/master/data. The string 
>> "pmproofs_*n*" in a filename indicates that helper proofs are referenced 
>> for parts that are used in at least *n* different proofs.
>>
>> My findings were assisted by a custom tool I wrote in C++ (still a 
>> private GitHub repo for assisting my research), and I have – If I didn't 
>> mess something up – exhaustively searched all proofs up to a length of 29 
>> steps.
>>
>> — Samiro Discher
>>
>> PS: The contact information on the Metamath website and in pmproofs.txt 
>> is still of Dr. Norman Megill. At first I tried to use it despite him being 
>> deceased, and got no respone from anyone.
>>
>> PPS: Is there a way I can add a non-Google mail address to this mailing 
>> list?
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/fc97463a-ea57-41e4-9814-66e1a7b52985n%40googlegroups.com.

[Metamath] Re: My contribution to Metamath's mmsolitaire project

Reply via email to