I released a new version <https://github.com/xamidi/pmGenerator/releases/tag/1.1.0> of pmGenerator which is capable of distributed computing via MPI <https://en.wikipedia.org/wiki/Message_Passing_Interface> for filtering generated proof sets. As previously announced <https://groups.google.com/g/metamath/c/v0p86y5b-m0/m/vczj35-uAAAJ>. You can find some information <https://github.com/xamidi/pmGenerator#multi-node-computing> on that in the readme.
This completes all core features of the tool. If you find any bugs, please submit an issue <https://github.com/xamidi/pmGenerator/issues>. Samiro Discher schrieb am Samstag, 25. Februar 2023 um 00:38:01 UTC+1: > Small update: > I fixed an issue where the output wasn't shown on a Windows Command Prompt > (CMD), by using different linker flags. CMD still isn't capable to print > UTF-8 encoded special caracters, and I won't fix that, but there are better > alternatives to CMD (as mentioned in the release notes). > The new binaries are now published as a new minor release > <https://github.com/xamidi/pmGenerator/releases/tag/1.0.1>. > > Some more changes: > - Update to most recent dependencies (Boost 1.81.0 and oneTBB 2021.8.0-1) > - Avoid huge string acquisition during generator file conversion (in hopes > to avoid a bug on Windows where sometimes huge files were written > incompletely) > - Update README > > In case you are looking for the heart of the proof collection generator > (which also contains exemplary measurements in the comments), here you go > <https://github.com/xamidi/pmGenerator/blob/1.0.1/nortmann/DlProofEnumerator.cpp#L361-L612> > . > And here > <https://github.com/xamidi/pmGenerator/blob/1.0.1/nortmann/DlCore.cpp#L540-L620> > > is the schema check over formula strings, look for "polishnotation" > (case-insensitive) to find more optimized formula handling. > > PS: In my last post, instead of > > since a substitution to multiple variables – which are links to the same > node – then is just a single node assignment > I meant to say: > > since a substitution to multiple occurences of the same variable – which > are links to the same node – then is just a single node assignment > > *I forgot to mention initially that I very much appreciate if you can > bring any bugs or issues to my attention. :-)* > > — Samiro Discher > > Samiro Discher schrieb am Donnerstag, 23. Februar 2023 um 06:53:45 UTC+1: > >> I have just published the code of my pmGenerator >> <https://github.com/xamidi/pmGenerator> tool and released >> <https://github.com/xamidi/pmGenerator/releases/tag/1.0.0> a first >> version. >> It is performance-oriented, implemented in C++, and using Intel's oneTBB >> <https://github.com/oneapi-src/oneTBB> for multithread-capable data >> structures. >> >> The tool uses a command-line interface with different commands capable to >> parse and generate variants of pmproofs.txt >> <https://us.metamath.org/mmsolitaire/pmproofs.txt>. >> >> It is meant to generate sets of condensed detachment proofs (called >> "D-proofs" from now on) for all attainable conclusions of D-proofs up to a >> certain length, and then to generate an improved version of that database >> of shortest known D-proofs (whose theorems originate from Principia >> Mathematica). >> >> As I have previously explained in another thread >> <https://groups.google.com/g/metamath/c/v0p86y5b-m0/m/Rni0LMv_BwAJ>, >> there are files dProof<n>.txt generated for sets that have filtered out >> proofs of conclusions for which there are more general variants proven in >> lower or equal amounts of steps, and dProof<n>-unfiltered<m>+.txt for >> collections where this kind of filtering was disabled for proofs of length >> m and higher (which to generate are much more time efficient but less >> space efficient). >> The project's README >> <https://github.com/xamidi/pmGenerator/blob/master/README.md> contains a >> table with links to some of these data files that were already generated. >> If you've got a really strong computer (preferably with a lot of RAM), >> you might be able to generate the next higher sets based on the current >> files and possibly find some new shortest known proofs. Subproofs of >> lengths up to 33 have been searched exhaustively, as indicated by the >> currently greatest linked file dProofs33-unfiltered31+.txt, which alone >> has a size of 8349023875 bytes, containing 45774890 pairs of D-proofs >> (each of length 33) with their conclusions. >> >> *Some changes since I posted in the other thread:* >> - Data files can now contain conclusions, so the D-proofs do not have to >> be reparsed while loading (which took a rather long time for the bigger >> files). >> - Conclusions are no longer stored as tree structures, but (just like in >> files) as strings in a prefix notation, where variables are numbered 0, 1, >> 2,... by order off occurence, consecutive variabes are separated by '.', >> and operators are given according to Łukasiewicz-style Polish notation. For >> example the formula ( -. ph -> ph ) would be represented by CN0.0. >> Operating on mere strings improved space efficiency of the tool to >> essentially what is optimally possible. Fortunately, there are very fast >> methods (such as abstraction checks) to operate on formulas in prefix >> notation, such that this did not introduce any penalty on performance at >> all (but in the contrary since memory is slow). >> Note that, however, the D-proof parser still uses tree structures for >> performance reasons (since a substitution to multiple variables – which >> are links to the same node – then is just a single node assignment). >> >> Notably, only shared memory parallelism is supported. After I fiddled >> around with MPI <https://en.wikipedia.org/wiki/Message_Passing_Interface> >> to make it usable on supercomputers, I noticed that this problem isn't, in >> a way, "friendly" for distributed memory parallelism: >> - Using shared memory, we're already in a realm where hundreds of GiBs of >> RAM are required to generate the next greater D-proof collection ( >> dProof35) without page faults. >> - Modern supercomputers have many nodes, but not every such node has huge >> amounts of memory, so loading the full collections on every node is not an >> option. >> - When using distributed memory to circumvent the memory problem, i.e. >> distributing the database across all nodes, an insertion of a single >> D-proof would require the proof (and its conclusion) to be sent in a >> (worstcase: full) loop across utilized nodes. >> Due to the already billions (+ exponentially growing) of tests, this >> would overuse the network and thereby probably be terribly slow. (I >> might be lacking a good idea here, though.) >> >> There is also an exemplary command to extract some plot data from the >> collections. >> I noticed, for example, that most conclusions of short proofs have odd >> symbolic lengths (which are the amounts of nodes in their syntax trees): >> [image: plot-propEvenThmLen.png] >> And here is some actual plot data concerning symbolic theorem lengths >> visualized.. >> [image: plot-concSymbLen_23-29_500.png] >> .. which can be "zoomed in" by only showing data within certain bounds: >> [image: plot-concSymbLen_23-29_1k,100.png] >> >> I hope some will find this tool useful or at least nice to play around >> with. >> As far as I know, it is the first program capable of composing >> pmproofs.txt, and the fastest one to parse it. >> >> If you have any questions, please do not hesitate to ask. >> >> — Samiro Discher >> > -- You received this message because you are subscribed to the Google Groups "Metamath" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/278ef969-2ea4-4993-97ec-577afd13df6an%40googlegroups.com.
