This is really cool and impressive work :) Are there any other optimisations you can think of which might make it quicker? Like how close do you think this is to optimal?
Do you think it would help to implement it for gpus? Maybe the data blowup is more of a problem, if you already have it for multithread cpu then that sounds pretty good. How much more do you think you need in terms of computing resources? If you have the program ready to go it might be nice to apply to some grant bodies for money to go further. Though maybe it's of quite niche interest. Anyway yeah this is all really awesome :) Just wanted to say well done! On Sunday, October 9, 2022 at 9:03:14 PM UTC+1 Samiro Discher wrote: > Thanks :-) > > Yes, the ones for the second proposal are ties, but I wouldn't say it's > worth keeping them all since one could generate huge amounts of ties from > what my tool can generate and what information such a proof – and the > collection as a whole – contains. Except if you mean keeping only both, the > historically first discovered and the alphanumerically minimal variants, > since they are both "relevant" in a way. > > Let me explain this and my approach a bit. (Sorry for the incoming wall of > text. Feel free to skip at your pace. :-) > > A parser can quickly extract all subproof strings, e.g. > DD2D1DD22D11DD2D112 (length 19) for theorem *1.5 has 7 non-trivial proper > subproofs (i.e. excluding axioms and itself): D11, D22, D2D11, DD22D11, > DD2D112, D1DD22D11, and D2D1DD22D11. Since every proof is essentially a > modus ponens (two inputs and one output), a proof of length *n* ≥ 3 (they > all have odd lengths) has up to (*n*-3)/2 non-trivial proper subproofs > (there may be overlaps). > > These numbers can grow quite large for longer proofs, and there may be > many overlaps, e.g. my 1883-step proof of biass alone has 222 different > non-trivial proper subproofs. When counting such, the unified collection > (pmproofs-unified.txt) consists of 1374 proofs (which are all listed > explicitly in data/pmproofs_1-unified.dpc), and FYI the non-unified > collection consists of 1436 proofs (listed explicitly in > data/pmproofs_1.dpc). > Now each such proof has a final formula (the conclusion), and many of > those formulas have several shortest (known) proofs. There may even be > cases where both a formula and a different schema of that formula are > suitable with both having a minimal proof of the same length. > > Now for a proof that has *m* uses of a certain conclusion where that > conclusion has *n* shortest (known) proofs, there is a number of *m* to > the power of *n* ways to only insert proofs of that conclusion into the > final proof. So we are talking about exponential growth of possibilities > based on the number of overlaps and redundant shortest (known) proofs. > There is certainly quite a number of other 1883-step proofs of biass that > could be deduced from the information that is already in pmproofs.txt. But > not in pmproofs-unified.txt since the overlaps have been removed there. > > > Now for the overall approach, my parser creates the list of non-trivial > subproofs merged from all proofs, sorts them by their lengths and parses > them all to their conclusions. > Then I used files I generated which contain proofs of all possible > conclusions of proofs up to length 29 to find better variants, using a lot > of schema checks (my formulas are actually tree structures). If a > conclusion *A* from my files is a schema of a conclusion *B* of > pmproofs.txt, all string occurrences of proof(*B*) within pmproofs.txt > can simply be replaced by proof(*A*), which keeps all the proofs valid > and ideally shortens them. > > So I collected all the best potential replacements, and then applied them > from long to short. > Not too interesting I'd say, but rather straightforward (it took me one > day when all the basic functionality was already implemented). > What really took its time and great effort was the very optimized > implementation of the D-proof-to-formula (tree structure) parser and > subtree checks, and the generator of all possible outcomes. For instance, I > can parse pmproofs.txt in less than 170 ms (to what you see in > data/pmproofs_2.dpc) using a single thread of my old Intel Core i7-3770, > but I use multiple threads to compare and generate results. > > The generator of all provable formulas is fully parallelized via Intel's > oneTBB library. I essentially implemented a push-down automaton to generate > all possible D-proofs of length *n*+2 based on when all of length *n* are > already known, then do this step by step (i.e. file by file). But I sorted > out all of the redundancies, including proofs with a conclusion for which > there is a schema that has a known proof at least as short. So the > remaining proofs have a conclusion that represents a whole bunch of other > proofs and potentially other formulas. In short, I kept the amount of data > as small as possible. > Here is the amount of new representative proofs of lengths up to 29: > 1 : 3 > 3 : 6 > 5 : 12 > 7 : 38 > 9 : 89 > 11 : 229 > 13 : 672 > 15 : 1844 > 17 : 5221 > 19 : 15275 > 21 : 44206 > 23 : 129885 > 25 : 385789 > 27 : 1149058 > 29 : 3449251 > (5181578 representatives in total) > > For example, dProofs29.txt is already 103.477529 megabytes (that is > 3449251*(29+1)-1 bytes) in size, and all the parsed data from those files, > including the formula trees of all of the conclusions, just barely fits > into my 32 GiB of DDR memory. What takes a lot of time currently are the > schema checks of which there are too many (quadratic complexity; pairwise > comparisons only limited by fitting formula lengths), and I am still hoping > to be able to come up with a database structure to make this more > efficient. (For the 27->29 step, everything else took just under an hour on > my PC, but the schema filtering took over 17 days.) The gain of schema > filtering is little but still has an exponential effect on subsequent > files. (926015 proofs were filtered out from the proofs of length 29 in > that manner.) Anyway, a supercomputer could probably produce more shortest > results using my HPC-ready program, and I will try to get access to one. > > > I hope this is somewhat readable, if something is still unclear, feel free > to ask. And let's hope at least David is still alive .. (´ ∀ ` *) > > — Samiro Discher > > > Benoit schrieb am Sonntag, 9. Oktober 2022 um 14:50:47 UTC+2: > >> Congratulations ! >> >> Since that page of the website is not on git, I think only David can add >> your proposals. I checked both using drule.c. >> >> As for your second proposal, that happened because you found "ties" among >> shortest known proofs, is that it ? Maybe it's worth keeping them all ? >> >> I'm curious to see that C++ tool when you publicly release it. I don't >> know when the number of "21 steps" in pmproofs.txt for exhaustive search >> was written, but being able to reach 29 steps, given the exponential nature >> of the problem, seems impressive. And also for the non-exhaustive part, >> the heuristics you're using are probably interesting. >> >> Benoît >> >> >> On Thursday, October 6, 2022 at 9:50:28 PM UTC+2 Samiro Discher wrote: >> >>> I'd like to contribute my findings to >>> https://us.metamath.org/mmsolitaire/pmproofs.txt. >>> >>> I found several (17) shorter proofs. See 'pmproofs.txt' at >>> https://github.com/xamidi/mmsolitaire [changes: 2be2349 >>> <https://github.com/xamidi/mmsolitaire/commit/2be2349a5621eaaee5d5940e98202bd265c9d50a> >>> ]: >>> *2.62: 89 -> 77 >>> *3.31: 103 -> 83 >>> *3.43: 137 -> 117 >>> *3.44: 203 -> 183 >>> *4.14: 321 -> 283 >>> *4.32: 355 -> 317 >>> *4.39: 609 -> 479 >>> *4.41: 271 -> 249 >>> *4.76: 271 -> 249 >>> *4.77: 375 -> 287 >>> *4.86: 617 -> 557 >>> *4.87: 523 -> 447 >>> *5.31: 119 -> 109 >>> *5.6: 203 -> 167 >>> *5.75: 387 -> 351 >>> meredith: 145 -> 141 >>> biass: 2127 -> 1883 >>> >>> Furthermore, I modified another 26 proofs such that all subproofs of the >>> same formula (ranging over the whole collection) are equal (preferring the >>> lexicographically >>> smaller proof strings). See 'pmproofs-unified.txt'. Feel free to use >>> its contents if desired. I find that property useful since this way, there >>> are no redundant proofs when parsing the collection, which saves time and >>> space and determines ideal and unique references. To showcase this, I >>> uploaded example files of the parsed collections to >>> https://github.com/xamidi/mmsolitaire/tree/master/data. The string >>> "pmproofs_*n*" in a filename indicates that helper proofs are >>> referenced for parts that are used in at least *n* different proofs. >>> >>> My findings were assisted by a custom tool I wrote in C++ (still a >>> private GitHub repo for assisting my research), and I have – If I didn't >>> mess something up – exhaustively searched all proofs up to a length of 29 >>> steps. >>> >>> — Samiro Discher >>> >>> PS: The contact information on the Metamath website and in pmproofs.txt >>> is still of Dr. Norman Megill. At first I tried to use it despite him being >>> deceased, and got no respone from anyone. >>> >>> PPS: Is there a way I can add a non-Google mail address to this mailing >>> list? >>> >>> >>> -- You received this message because you are subscribed to the Google Groups "Metamath" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/f5939a61-607c-4b32-8d95-9efa775b56c3n%40googlegroups.com.
