[Metamath] Re: Idea: Release ".txt" proofs for set.mm databases ("mpetxt") for AI/ML tools, esp Generative Pre-trained Transformers (GPTs)

Jon P Tue, 14 Feb 2023 10:47:00 -0800

I think that's a great idea David! Seems like very low downside and it 
would be interesting to see if language models could train on the datasets.


There's already an option to Customise GPT3 with additional datasets so 
once this is ready it could be an interesting experiment to try.

https://openai.com/blog/customized-gpt-3/

On Monday, February 13, 2023 at 6:20:03 PM UTC David A. Wheeler wrote:

> All: I'm thinking it might be a good idea to create a new "database area",
> "mpetxt" (to match "mpeuni"/"mpegif"), with many simple text files - 1 per 
> proof.
> The goal would be to make it easier for AI/ML tools to learn from them
> so they perhaps could generate potential proofs.
>
> Current AI (specifically machine learning) tools have made a lot of 
> progress.
> In particular Generative Pre-trained Transformers (GPTs) like GPT-3 have 
> managed
> to generate remarkably good or at least useful answers in many cases.
> However, they aren't magic. Our compressed database files are hard for 
> them to learn
> from, and the HTML tables we show are human-readable but also
> challenging for a computer to learn from. On the other hand, once 
> presented as typical text,
> they have internal structures that are similar enough to text and code 
> that it's
> plausible similar solutions could also work for them.
>
> It appears that Microsoft is going to invest LOTS of money in scraping the 
> web to
> learn from, using that to train & generate better results. Google appears 
> to be rushing
> to do the same, as a countermeasure. They can already generate passable 
> code
> from prompts in many situations; even when they get it wrong, people 
> report the
> results are often helpful as ideas & information for helping to solve the 
> real problem.
>
> So I propose generating text files (1/proof) that would make it easier for 
> these
> tools to potentially learn how to generate Metamath proofs. Even when they 
> get it wrong,
> they *might* provide useful tips when trying to create a proof. It's not 
> clear to me that
> the 140K samples from MPE are enough for them to learn... but they *might* 
> be, and the
> only way to know is to let them try. The idea is that you'd ask one of 
> these tools
> "Please complete the following: Metamath proof of NAME in database set.mm 
> NAME $p ...."
> and it would complete it with an answer.
>
> Below is an example of what those files might look like. I'd basically 
> exploit
> Metamath's "show statement" and "show proof" and try to strip out 
> distractors
> that could interfere with machine learning.
> It might be wise to make the hypotheses & goal a single long sentence, 
> since it's not clear if
> users can enter embedded newlines in their prompts, but I haven't done 
> that here.
> I think it's important that each proof be in its own file, so the ML 
> algorithm knows
> exactly where they begin and end.
>
> There are currently 40627 $p statements in set.mm. I'm guessing that most 
> will fit within
> one 4KiB blocks (storage is allocated by blocks), so this would require 
> ~162 MiB of space.
> The us.metamath.org website isn't brimming with space, but we can do that.
>
> --- David A. Wheeler
>
> ========================
> Metamath proof of pm5.74 in database set.mm
> pm5.74 $p |- ( ( ph -> ( ps <-> ch ) ) <-> ( ( ph -> ps ) <-> ( ph -> ch ) 
> ) ) $= ... $.
>
> PROOF:
> 1 biimp $p |- ( ( ps <-> ch ) -> ( ps -> ch ) )
> 2 1 imim3i $p |- ( ( ph -> ( ps <-> ch ) ) -> ( ( ph -> ps ) -> ( ph -> ch 
> ) ) )
> 3 biimpr $p |- ( ( ps <-> ch ) -> ( ch -> ps ) )
> 4 3 imim3i $p |- ( ( ph -> ( ps <-> ch ) ) -> ( ( ph -> ch ) -> ( ph -> ps 
> ) ) )
> 5 2,4 impbid $p |- ( ( ph -> ( ps <-> ch ) ) -> ( ( ph -> ps ) <-> ( ph -> 
> ch ) ) )
> 6 biimp $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ( ph -> ps ) -> ( 
> ph -> ch ) ) )
> 7 6 pm2.86d $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ph -> ( ps -> 
> ch ) ) )
> 8 biimpr $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ( ph -> ch ) -> ( 
> ph -> ps ) ) )
> 9 8 pm2.86d $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ph -> ( ch -> 
> ps ) ) )
> 10 7,9 impbidd $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ph -> ( ps 
> <-> ch ) ) )
> 11 5,10 impbii $p |- ( ( ph -> ( ps <-> ch ) ) <-> ( ( ph -> ps ) <-> ( ph 
> -> ch ) ) )
>
> ========================
> Metamath proof of axtgcgrrflx in database set.mm
> axtrkg.p $e |- P = ( Base ` G ) $.
> axtrkg.d $e |- .- = ( dist ` G ) $.
> axtrkg.i $e |- I = ( Itv ` G ) $.
> axtrkg.g $e |- ( ph -> G e. TarskiG ) $.
> axtgcgrrflx.1 $e |- ( ph -> X e. P ) $.
> axtgcgrrflx.2 $e |- ( ph -> Y e. P ) $.
> axtgcgrrflx $p |- ( ph -> ( X .- Y ) = ( Y .- X ) ) $= ... $.
>
> PROOF:
> 1 df-trkg $a |- TarskiG = ( ( TarskiGC i^i TarskiGB ) i^i ( TarskiGCB i^i 
> { f | [. ( Base ` f ) / p ]. [. ( Itv ` f ) / i ]. ( LineG ` f ) = ( x e. p 
> , y e. ( p \ { x } ) |-> { z e. p | ( z e. ( x i y ) \/ x e. ( z i y ) \/ y 
> e. ( x i z ) ) } ) } ) )
> 2 inss1 $p |- ( ( TarskiGC i^i TarskiGB ) i^i ( TarskiGCB i^i { f | [. ( 
> Base ` f ) / p ]. [. ( Itv ` f ) / i ]. ( LineG ` f ) = ( x e. p , y e. ( p 
> \ { x } ) |-> { z e. p | ( z e. ( x i y ) \/ x e. ( z i y ) \/ y e. ( x i z 
> ) ) } ) } ) ) C_ ( TarskiGC i^i TarskiGB )
> 3 inss1 $p |- ( TarskiGC i^i TarskiGB ) C_ TarskiGC
> 4 2,3 sstri $p |- ( ( TarskiGC i^i TarskiGB ) i^i ( TarskiGCB i^i { f | [. 
> ( Base ` f ) / p ]. [. ( Itv ` f ) / i ]. ( LineG ` f ) = ( x e. p , y e. ( 
> p \ { x } ) |-> { z e. p | ( z e. ( x i y ) \/ x e. ( z i y ) \/ y e. ( x i 
> z ) ) } ) } ) ) C_ TarskiGC
> 5 1,4 eqsstri $p |- TarskiG C_ TarskiGC
> 6 axtrkg.g $e |- ( ph -> G e. TarskiG )
> 7 5,6 sseldi $p |- ( ph -> G e. TarskiGC )
> 8 axtrkg.p $e |- P = ( Base ` G )
> 9 axtrkg.d $e |- .- = ( dist ` G )
> 10 axtrkg.i $e |- I = ( Itv ` G )
> 11 8,9,10 istrkgc $p |- ( G e. TarskiGC <-> ( G e. _V /\ ( A. x e. P A. y 
> e. P ( x .- y ) = ( y .- x ) /\ A. x e. P A. y e. P A. z e. P ( ( x .- y ) 
> = ( z .- z ) -> x = y ) ) ) )
> 12 11 simprbi $p |- ( G e. TarskiGC -> ( A. x e. P A. y e. P ( x .- y ) = 
> ( y .- x ) /\ A. x e. P A. y e. P A. z e. P ( ( x .- y ) = ( z .- z ) -> x 
> = y ) ) )
> 13 12 simpld $p |- ( G e. TarskiGC -> A. x e. P A. y e. P ( x .- y ) = ( y 
> .- x ) )
> 14 7,13 syl $p |- ( ph -> A. x e. P A. y e. P ( x .- y ) = ( y .- x ) )
> 15 axtgcgrrflx.1 $e |- ( ph -> X e. P )
> 16 axtgcgrrflx.2 $e |- ( ph -> Y e. P )
> 17 oveq1 $p |- ( x = X -> ( x .- y ) = ( X .- y ) )
> 18 oveq2 $p |- ( x = X -> ( y .- x ) = ( y .- X ) )
> 19 17,18 eqeq12d $p |- ( x = X -> ( ( x .- y ) = ( y .- x ) <-> ( X .- y ) 
> = ( y .- X ) ) )
> 20 oveq2 $p |- ( y = Y -> ( X .- y ) = ( X .- Y ) )
> 21 oveq1 $p |- ( y = Y -> ( y .- X ) = ( Y .- X ) )
> 22 20,21 eqeq12d $p |- ( y = Y -> ( ( X .- y ) = ( y .- X ) <-> ( X .- Y ) 
> = ( Y .- X ) ) )
> 23 19,22 rspc2v $p |- ( ( X e. P /\ Y e. P ) -> ( A. x e. P A. y e. P ( x 
> .- y ) = ( y .- x ) -> ( X .- Y ) = ( Y .- X ) ) )
> 24 15,16,23 syl2anc $p |- ( ph -> ( A. x e. P A. y e. P ( x .- y ) = ( y 
> .- x ) -> ( X .- Y ) = ( Y .- X ) ) )
> 25 14,24 mpd $p |- ( ph -> ( X .- Y ) = ( Y .- X ) )
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/fb41f92d-a95f-4d23-9f08-c317e5df27a9n%40googlegroups.com.

[Metamath] Re: Idea: Release ".txt" proofs for set.mm databases ("mpetxt") for AI/ML tools, esp Generative Pre-trained Transformers (GPTs)

Reply via email to