[Metamath] Idea: Release ".txt" proofs for set.mm databases ("mpetxt") for AI/ML tools, esp Generative Pre-trained Transformers (GPTs)

David A. Wheeler Mon, 13 Feb 2023 10:20:07 -0800

All: I'm thinking it might be a good idea to create a new "database area",
"mpetxt" (to match "mpeuni"/"mpegif"), with many simple text files - 1 per 
proof.
The goal would be to make it easier for AI/ML tools to learn from them
so they perhaps could generate potential proofs.


Current AI (specifically machine learning) tools have made a lot of progress.
In particular Generative Pre-trained Transformers (GPTs) like GPT-3 have managed
to generate remarkably good or at least useful answers in many cases.
However, they aren't magic. Our compressed database files are hard for them to 
learn
from, and the HTML tables we show are human-readable but also
challenging for a computer to learn from. On the other hand, once presented as 
typical text,
they have internal structures that are similar enough to text and code that it's
plausible similar solutions could also work for them.

It appears that Microsoft is going to invest LOTS of money in scraping the web 
to
learn from, using that to train & generate better results. Google appears to be 
rushing
to do the same, as a countermeasure. They can already generate passable code
from prompts in many situations; even when they get it wrong, people report the
results are often helpful as ideas & information for helping to solve the real 
problem.

So I propose generating text files (1/proof) that would make it easier for these
tools to potentially learn how to generate Metamath proofs. Even when they get 
it wrong,
they *might* provide useful tips when trying to create a proof. It's not clear 
to me that
the 140K samples from MPE are enough for them to learn... but they *might* be, 
and the
only way to know is to let them try. The idea is that you'd ask one of these 
tools
"Please complete the following: Metamath proof of NAME in database set.mm NAME 
$p ...."
and it would complete it with an answer.

Below is an example of what those files might look like. I'd basically exploit
Metamath's "show statement" and "show proof" and try to strip out distractors
that could interfere with machine learning.
It might be wise to make the hypotheses & goal a single long sentence, since 
it's not clear if
users can enter embedded newlines in their prompts, but I haven't done that 
here.
I think it's important that each proof be in its own file, so the ML algorithm 
knows
exactly where they begin and end.

There are currently 40627 $p statements in set.mm. I'm guessing that most will 
fit within
one 4KiB blocks (storage is allocated by blocks), so this would require ~162 
MiB of space.
The us.metamath.org website isn't brimming with space, but we can do that.

--- David A. Wheeler

========================
Metamath proof of pm5.74 in database set.mm
pm5.74 $p |- ( ( ph -> ( ps <-> ch ) ) <-> ( ( ph -> ps ) <-> ( ph -> ch ) ) ) 
$= ... $.

PROOF:
 1 biimp         $p |- ( ( ps <-> ch ) -> ( ps -> ch ) )
 2 1 imim3i      $p |- ( ( ph -> ( ps <-> ch ) ) -> ( ( ph -> ps ) -> ( ph -> 
ch ) ) )
 3 biimpr        $p |- ( ( ps <-> ch ) -> ( ch -> ps ) )
 4 3 imim3i      $p |- ( ( ph -> ( ps <-> ch ) ) -> ( ( ph -> ch ) -> ( ph -> 
ps ) ) )
 5 2,4 impbid    $p |- ( ( ph -> ( ps <-> ch ) ) -> ( ( ph -> ps ) <-> ( ph -> 
ch ) ) )
 6 biimp         $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ( ph -> ps ) -> 
( ph -> ch ) ) )
 7 6 pm2.86d     $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ph -> ( ps -> 
ch ) ) )
 8 biimpr        $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ( ph -> ch ) -> 
( ph -> ps ) ) )
 9 8 pm2.86d     $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ph -> ( ch -> 
ps ) ) )
10 7,9 impbidd   $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ph -> ( ps <-> 
ch ) ) )
11 5,10 impbii   $p |- ( ( ph -> ( ps <-> ch ) ) <-> ( ( ph -> ps ) <-> ( ph -> 
ch ) ) )

========================
Metamath proof of axtgcgrrflx in database set.mm
axtrkg.p $e |- P = ( Base ` G ) $.
axtrkg.d $e |- .- = ( dist ` G ) $.
axtrkg.i $e |- I = ( Itv ` G ) $.
axtrkg.g $e |- ( ph -> G e. TarskiG ) $.
axtgcgrrflx.1 $e |- ( ph -> X e. P ) $.
axtgcgrrflx.2 $e |- ( ph -> Y e. P ) $.
axtgcgrrflx $p |- ( ph -> ( X .- Y ) = ( Y .- X ) ) $= ... $.

PROOF:
 1 df-trkg       $a |- TarskiG = ( ( TarskiGC i^i TarskiGB ) i^i ( TarskiGCB 
i^i { f | [. ( Base ` f ) / p ]. [. ( Itv ` f ) / i ]. ( LineG ` f ) = ( x e. p 
, y e. ( p \ { x } ) |-> { z e. p | ( z e. ( x i y ) \/ x e. ( z i y ) \/ y e. 
( x i z ) ) } ) } ) )
 2 inss1         $p |- ( ( TarskiGC i^i TarskiGB ) i^i ( TarskiGCB i^i { f | [. 
( Base ` f ) / p ]. [. ( Itv ` f ) / i ]. ( LineG ` f ) = ( x e. p , y e. ( p \ 
{ x } ) |-> { z e. p | ( z e. ( x i y ) \/ x e. ( z i y ) \/ y e. ( x i z ) ) } 
) } ) ) C_ ( TarskiGC i^i TarskiGB )
 3 inss1         $p |- ( TarskiGC i^i TarskiGB ) C_ TarskiGC
 4 2,3 sstri     $p |- ( ( TarskiGC i^i TarskiGB ) i^i ( TarskiGCB i^i { f | [. 
( Base ` f ) / p ]. [. ( Itv ` f ) / i ]. ( LineG ` f ) = ( x e. p , y e. ( p \ 
{ x } ) |-> { z e. p | ( z e. ( x i y ) \/ x e. ( z i y ) \/ y e. ( x i z ) ) } 
) } ) ) C_ TarskiGC
 5 1,4 eqsstri   $p |- TarskiG C_ TarskiGC
 6 axtrkg.g      $e |- ( ph -> G e. TarskiG )
 7 5,6 sseldi    $p |- ( ph -> G e. TarskiGC )
 8 axtrkg.p      $e |- P = ( Base ` G )
 9 axtrkg.d      $e |- .- = ( dist ` G )
10 axtrkg.i      $e |- I = ( Itv ` G )
11 8,9,10 istrkgc  $p |- ( G e. TarskiGC <-> ( G e. _V /\ ( A. x e. P A. y e. P 
( x .- y ) = ( y .- x ) /\ A. x e. P A. y e. P A. z e. P ( ( x .- y ) = ( z .- 
z ) -> x = y ) ) ) )
12 11 simprbi    $p |- ( G e. TarskiGC -> ( A. x e. P A. y e. P ( x .- y ) = ( 
y .- x ) /\ A. x e. P A. y e. P A. z e. P ( ( x .- y ) = ( z .- z ) -> x = y ) 
) )
13 12 simpld     $p |- ( G e. TarskiGC -> A. x e. P A. y e. P ( x .- y ) = ( y 
.- x ) )
14 7,13 syl      $p |- ( ph -> A. x e. P A. y e. P ( x .- y ) = ( y .- x ) )
15 axtgcgrrflx.1  $e |- ( ph -> X e. P )
16 axtgcgrrflx.2  $e |- ( ph -> Y e. P )
17 oveq1         $p |- ( x = X -> ( x .- y ) = ( X .- y ) )
18 oveq2         $p |- ( x = X -> ( y .- x ) = ( y .- X ) )
19 17,18 eqeq12d  $p |- ( x = X -> ( ( x .- y ) = ( y .- x ) <-> ( X .- y ) = ( 
y .- X ) ) )
20 oveq2         $p |- ( y = Y -> ( X .- y ) = ( X .- Y ) )
21 oveq1         $p |- ( y = Y -> ( y .- X ) = ( Y .- X ) )
22 20,21 eqeq12d  $p |- ( y = Y -> ( ( X .- y ) = ( y .- X ) <-> ( X .- Y ) = ( 
Y .- X ) ) )
23 19,22 rspc2v  $p |- ( ( X e. P /\ Y e. P ) -> ( A. x e. P A. y e. P ( x .- y 
) = ( y .- x ) -> ( X .- Y ) = ( Y .- X ) ) )
24 15,16,23 syl2anc  $p |- ( ph -> ( A. x e. P A. y e. P ( x .- y ) = ( y .- x 
) -> ( X .- Y ) = ( Y .- X ) ) )
25 14,24 mpd     $p |- ( ph -> ( X .- Y ) = ( Y .- X ) )


-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/6025E3C8-6FC7-4934-BD53-A3EBC813048F%40dwheeler.com.

[Metamath] Idea: Release ".txt" proofs for set.mm databases ("mpetxt") for AI/ML tools, esp Generative Pre-trained Transformers (GPTs)

Reply via email to