[Metamath] Re: An exhaustive generator for the mmsolitaire project

Samiro Discher Fri, 24 Feb 2023 15:38:07 -0800

Small update:
I fixed an issue where the output wasn't shown on a Windows Command Prompt 
(CMD), by using different linker flags. CMD still isn't capable to print 
UTF-8 encoded special caracters, and I won't fix that, but there are better 
alternatives to CMD (as mentioned in the release notes).
The new binaries are now published as a new minor release 
<https://github.com/xamidi/pmGenerator/releases/tag/1.0.1>.


Some more changes:
- Update to most recent dependencies (Boost 1.81.0 and oneTBB 2021.8.0-1)
- Avoid huge string acquisition during generator file conversion (in hopes 
to avoid a bug on Windows where sometimes huge files were written 
incompletely)
- Update README 

In case you are looking for the heart of the proof collection generator 
(which also contains exemplary measurements in the comments), here you go 
<https://github.com/xamidi/pmGenerator/blob/1.0.1/nortmann/DlProofEnumerator.cpp#L361-L612>
.
And here 
<https://github.com/xamidi/pmGenerator/blob/1.0.1/nortmann/DlCore.cpp#L540-L620>
 
is the schema check over formula strings, look for "polishnotation" 
(case-insensitive) to find more optimized formula handling.

PS: In my last post, instead of
> since a substitution to multiple variables – which are links to the same 
node – then is just a single node assignment
I meant to say:
> since a substitution to multiple occurences of the same variable – which 
are links to the same node – then is just a single node assignment

*I forgot to mention initially that I very much appreciate if you can bring 
any bugs or issues to my attention. :-)*

— Samiro Discher 

Samiro Discher schrieb am Donnerstag, 23. Februar 2023 um 06:53:45 UTC+1:

> I have just published the code of my pmGenerator 
> <https://github.com/xamidi/pmGenerator> tool and released 
> <https://github.com/xamidi/pmGenerator/releases/tag/1.0.0> a first 
> version.
> It is performance-oriented, implemented in C++, and using Intel's oneTBB 
> <https://github.com/oneapi-src/oneTBB> for multithread-capable data 
> structures.
>
> The tool uses a command-line interface with different commands capable to 
> parse and generate variants of pmproofs.txt 
> <https://us.metamath.org/mmsolitaire/pmproofs.txt>.
>
> It is meant to generate sets of condensed detachment proofs (called 
> "D-proofs" from now on) for all attainable conclusions of D-proofs up to a 
> certain length, and then to generate an improved version of that database 
> of shortest known D-proofs (whose theorems originate from Principia 
> Mathematica). 
>
> As I have previously explained in another thread 
> <https://groups.google.com/g/metamath/c/v0p86y5b-m0/m/Rni0LMv_BwAJ>, 
> there are files dProof<n>.txt generated for sets that have filtered out 
> proofs of conclusions for which there are more general variants proven in 
> lower or equal amounts of steps, and dProof<n>-unfiltered<m>+.txt for 
> collections where this kind of filtering was disabled for proofs of length 
> m and higher (which to generate are much more time efficient but less 
> space efficient).
> The project's README 
> <https://github.com/xamidi/pmGenerator/blob/master/README.md> contains a 
> table with links to some of these data files that were already generated.
> If you've got a really strong computer (preferably with a lot of RAM), you 
> might be able to generate the next higher sets based on the current files 
> and possibly find some new shortest known proofs. Subproofs of lengths up 
> to 33 have been searched exhaustively, as indicated by the currently 
> greatest linked file dProofs33-unfiltered31+.txt, which alone has a size 
> of 8349023875 bytes, containing 45774890 pairs of D-proofs (each of 
> length 33) with their conclusions.
>
> *Some changes since I posted in the other thread:*
> - Data files can now contain conclusions, so the D-proofs do not have to 
> be reparsed while loading (which took a rather long time for the bigger 
> files).
> - Conclusions are no longer stored as tree structures, but (just like in 
> files) as strings in a prefix notation, where variables are numbered 0, 1, 
> 2,... by order off occurence, consecutive variabes are separated by '.', 
> and operators are given according to Łukasiewicz-style Polish notation. For 
> example the formula ( -. ph -> ph ) would be represented by CN0.0.
>   Operating on mere strings improved space efficiency of the tool to 
> essentially what is optimally possible. Fortunately, there are very fast 
> methods (such as abstraction checks) to operate on formulas in prefix 
> notation, such that this did not introduce any penalty on performance at 
> all (but in the contrary since memory is slow).
>   Note that, however, the D-proof parser still uses tree structures for 
> performance reasons (since a substitution to multiple variables – which 
> are links to the same node – then is just a single node assignment).
>
> Notably, only shared memory parallelism is supported. After I fiddled 
> around with MPI <https://en.wikipedia.org/wiki/Message_Passing_Interface> 
> to make it usable on supercomputers, I noticed that this problem isn't, in 
> a way, "friendly" for distributed memory parallelism:
> - Using shared memory, we're already in a realm where hundreds of GiBs of 
> RAM are required to generate the next greater D-proof collection (dProof35) 
> without page faults.
> - Modern supercomputers have many nodes, but not every such node has huge 
> amounts of memory, so loading the full collections on every node is not an 
> option.
> - When using distributed memory to circumvent the memory problem, i.e. 
> distributing the database across all nodes, an insertion of a single 
> D-proof would require the proof (and its conclusion) to be sent in a 
> (worstcase: full) loop across utilized nodes.
>   Due to the already billions (+ exponentially growing) of tests, this 
> would overuse the network and thereby probably be terribly slow. (I might 
> be lacking a good idea here, though.)
>
> There is also an exemplary command to extract some plot data from the 
> collections.
> I noticed, for example, that most conclusions of short proofs have odd 
> symbolic lengths (which are the amounts of nodes in their syntax trees):
> [image: plot-propEvenThmLen.png]
> And here is some actual plot data concerning symbolic theorem lengths 
> visualized..
> [image: plot-concSymbLen_23-29_500.png]
> .. which can be "zoomed in" by only showing data within certain bounds:
> [image: plot-concSymbLen_23-29_1k,100.png]
>
> I hope some will find this tool useful or at least nice to play around 
> with.
> As far as I know, it is the first program capable of composing 
> pmproofs.txt, and the fastest one to parse it.
>
> If you have any questions, please do not hesitate to ask.
>
> — Samiro Discher 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/1c95471c-c42c-4e26-b14f-1fa4aef3479fn%40googlegroups.com.

[Metamath] Re: An exhaustive generator for the mmsolitaire project

Reply via email to