Re: [Metamath] Idea: Release ".txt" proofs for set.mm databases ("mpetxt") for AI/ML tools, esp Generative Pre-trained Transformers (GPTs)

Cris Perdue Tue, 14 Feb 2023 18:32:37 -0800

David,

I take this idea as an approach to making metamath proofs more digestible
by machines, and I love it for that.


It might be worth mentioning that it is easier than it might appear to
extract the text from a web page as in the Metamath site. In fact a tiny
experiment on my part came out nicer than I hoped.  Opening up the Chrome
debugging console on the page for pm5.74, I did:

> console.log(document.body)

and got:

Metamath Proof Explorer < Previous   Next >
Nearby theorems
Mirrors  >  Home  >  MPE Home  >  Th. List  >  pm5.74 Structured version
Visualization version   GIF version
Theorem pm5.74 262
Description: Distribution of implication over biconditional. Theorem *5.74
of [WhiteheadRussell] p. 126. (Contributed by NM, 1-Aug-1994.) (Proof
shortened by Wolf Lammen, 11-Apr-2013.)
Assertion
Ref Expression
pm5.74 ⊢ ((𝜑 → (𝜓 ↔ 𝜒)) ↔ ((𝜑 → 𝜓) ↔ (𝜑 → 𝜒)))
Proof of Theorem pm5.74
Step Hyp Ref Expression
1   biimp 207 . . . 4 ⊢ ((𝜓 ↔ 𝜒) → (𝜓 → 𝜒))
2 1 imim3i 64 . . 3 ⊢ ((𝜑 → (𝜓 ↔ 𝜒)) → ((𝜑 → 𝜓) → (𝜑 → 𝜒)))
3   biimpr 212 . . . 4 ⊢ ((𝜓 ↔ 𝜒) → (𝜒 → 𝜓))
4 3 imim3i 64 . . 3 ⊢ ((𝜑 → (𝜓 ↔ 𝜒)) → ((𝜑 → 𝜒) → (𝜑 → 𝜓)))
5 2, 4 impbid 204 . 2 ⊢ ((𝜑 → (𝜓 ↔ 𝜒)) → ((𝜑 → 𝜓) ↔ (𝜑 → 𝜒)))
6   biimp 207 . . . 4 ⊢ (((𝜑 → 𝜓) ↔ (𝜑 → 𝜒)) → ((𝜑 → 𝜓) → (𝜑 → 𝜒)))
7 6 pm2.86d 108 . . 3 ⊢ (((𝜑 → 𝜓) ↔ (𝜑 → 𝜒)) → (𝜑 → (𝜓 → 𝜒)))
8   biimpr 212 . . . 4 ⊢ (((𝜑 → 𝜓) ↔ (𝜑 → 𝜒)) → ((𝜑 → 𝜒) → (𝜑 → 𝜓)))
9 8 pm2.86d 108 . . 3 ⊢ (((𝜑 → 𝜓) ↔ (𝜑 → 𝜒)) → (𝜑 → (𝜒 → 𝜓)))
10 7, 9 impbidd 202 . 2 ⊢ (((𝜑 → 𝜓) ↔ (𝜑 → 𝜒)) → (𝜑 → (𝜓 ↔ 𝜒)))
11 5, 10 impbii 201 1 ⊢ ((𝜑 → (𝜓 ↔ 𝜒)) ↔ ((𝜑 → 𝜓) ↔ (𝜑 → 𝜒)))
Colors of variables: wff setvar class
Syntax hints:   → wi 4   ↔ wb 198
This theorem was proved from axioms:  ax-mp 5  ax-1 6  ax-2 7  ax-3 8
This theorem depends on definitions:  df-bi 199
This theorem is referenced by:  pm5.74i  263  pm5.74ri  264  pm5.74d  265
 pm5.74rd  266  bibi2d  334  pm5.32  570  orbidi  976
  Copyright terms: Public domain W3C validator


So perhaps the "plain text" may be accessible enough to systems with
well-developed HTML and DOM processing. Arguably it might be more desirable
to have the ASCII text rather than Unicode in the formulas, as parsing of
the ASCII text is presumably more well-defined.

There is something quite similar that I think would be even cooler:
JSON-format versions of theorem pages. JSON is a well-defined textual
format, extremely popular and widespread in use on the Web and elsewhere.
I'm thinking of a JSON-based format something like:

{"theorem": "m5.74",
 "hyps": [],
 "wff": "( ( ph -> ( ps <-> ch ) ) <-> ( ( ph -> ps ) <-> ( ph -> ch ) ) )",
 "proof": [
 {"i": 1, "ref": "biimp" "f": "( ( ps <-> ch ) -> ( ps -> ch ) )"},
 {"i": 2, "deps": [1], "ref": "imim3i", "f":( ( ph -> ( ps <-> ch ) ) -> (
( ph -> ps ) -> ( ph -> ch ) ) )"},
 {"i": 3 "ref": "biimpr", "f": "( ( ps <-> ch ) -> ( ch -> ps ) )"},
 {"i": 4 "deps": [3], "ref": "imim3i", "f": "( ( ph -> ( ps <-> ch ) ) -> (
( ph -> ch ) -> ( ph -> ps ) ) )"},
 {"i": 5, "deps": [2,4], "ref": "impbid", "f":  "( ( ph -> ( ps <-> ch ) )
-> ( ( ph -> ps ) <-> ( ph -> ch ) ) )"},
 (and so on)
 ]
}

(The details of course could vary, while still making use of the JSON
format.)

Presumably pages in such a format would include enough information such as
distinct variable groups,
to enable reproducing actual proofs. It could be used as data for
experimental proof displayers. It would also
be very straightforward to build web apps that present proofs in custom
formats within web browsers, as
browsers support JSON format very well.

I see no reason this kind of format would should not be at least as
accessible to AI as plain text.

-Cris


On Mon, Feb 13, 2023 at 10:20 AM David A. Wheeler <[email protected]>
wrote:

> All: I'm thinking it might be a good idea to create a new "database area",
> "mpetxt" (to match "mpeuni"/"mpegif"), with many simple text files - 1 per
> proof.
> The goal would be to make it easier for AI/ML tools to learn from them
> so they perhaps could generate potential proofs.
>
> Current AI (specifically machine learning) tools have made a lot of
> progress.
> In particular Generative Pre-trained Transformers (GPTs) like GPT-3 have
> managed
> to generate remarkably good or at least useful answers in many cases.
> However, they aren't magic. Our compressed database files are hard for
> them to learn
> from, and the HTML tables we show are human-readable but also
> challenging for a computer to learn from. On the other hand, once
> presented as typical text,
> they have internal structures that are similar enough to text and code
> that it's
> plausible similar solutions could also work for them.
>
> It appears that Microsoft is going to invest LOTS of money in scraping the
> web to
> learn from, using that to train & generate better results. Google appears
> to be rushing
> to do the same, as a countermeasure. They can already generate passable
> code
> from prompts in many situations; even when they get it wrong, people
> report the
> results are often helpful as ideas & information for helping to solve the
> real problem.
>
> So I propose generating text files (1/proof) that would make it easier for
> these
> tools to potentially learn how to generate Metamath proofs. Even when they
> get it wrong,
> they *might* provide useful tips when trying to create a proof. It's not
> clear to me that
> the 140K samples from MPE are enough for them to learn... but they *might*
> be, and the
> only way to know is to let them try. The idea is that you'd ask one of
> these tools
> "Please complete the following: Metamath proof of NAME in database set.mm
> NAME $p ...."
> and it would complete it with an answer.
>
> Below is an example of what those files might look like. I'd basically
> exploit
> Metamath's "show statement" and "show proof" and try to strip out
> distractors
> that could interfere with machine learning.
> It might be wise to make the hypotheses & goal a single long sentence,
> since it's not clear if
> users can enter embedded newlines in their prompts, but I haven't done
> that here.
> I think it's important that each proof be in its own file, so the ML
> algorithm knows
> exactly where they begin and end.
>
> There are currently 40627 $p statements in set.mm. I'm guessing that most
> will fit within
> one 4KiB blocks (storage is allocated by blocks), so this would require
> ~162 MiB of space.
> The us.metamath.org website isn't brimming with space, but we can do that.
>
> --- David A. Wheeler
>
> ========================
> Metamath proof of pm5.74 in database set.mm
> pm5.74 $p |- ( ( ph -> ( ps <-> ch ) ) <-> ( ( ph -> ps ) <-> ( ph -> ch )
> ) ) $= ... $.
>
> PROOF:
>  1 biimp         $p |- ( ( ps <-> ch ) -> ( ps -> ch ) )
>  2 1 imim3i      $p |- ( ( ph -> ( ps <-> ch ) ) -> ( ( ph -> ps ) -> ( ph
> -> ch ) ) )
>  3 biimpr        $p |- ( ( ps <-> ch ) -> ( ch -> ps ) )
>  4 3 imim3i      $p |- ( ( ph -> ( ps <-> ch ) ) -> ( ( ph -> ch ) -> ( ph
> -> ps ) ) )
>  5 2,4 impbid    $p |- ( ( ph -> ( ps <-> ch ) ) -> ( ( ph -> ps ) <-> (
> ph -> ch ) ) )
>  6 biimp         $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ( ph -> ps
> ) -> ( ph -> ch ) ) )
>  7 6 pm2.86d     $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ph -> ( ps
> -> ch ) ) )
>  8 biimpr        $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ( ph -> ch
> ) -> ( ph -> ps ) ) )
>  9 8 pm2.86d     $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ph -> ( ch
> -> ps ) ) )
> 10 7,9 impbidd   $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ph -> ( ps
> <-> ch ) ) )
> 11 5,10 impbii   $p |- ( ( ph -> ( ps <-> ch ) ) <-> ( ( ph -> ps ) <-> (
> ph -> ch ) ) )
>
> ========================
> Metamath proof of axtgcgrrflx in database set.mm
> axtrkg.p $e |- P = ( Base ` G ) $.
> axtrkg.d $e |- .- = ( dist ` G ) $.
> axtrkg.i $e |- I = ( Itv ` G ) $.
> axtrkg.g $e |- ( ph -> G e. TarskiG ) $.
> axtgcgrrflx.1 $e |- ( ph -> X e. P ) $.
> axtgcgrrflx.2 $e |- ( ph -> Y e. P ) $.
> axtgcgrrflx $p |- ( ph -> ( X .- Y ) = ( Y .- X ) ) $= ... $.
>
> PROOF:
>  1 df-trkg       $a |- TarskiG = ( ( TarskiGC i^i TarskiGB ) i^i (
> TarskiGCB i^i { f | [. ( Base ` f ) / p ]. [. ( Itv ` f ) / i ]. ( LineG `
> f ) = ( x e. p , y e. ( p \ { x } ) |-> { z e. p | ( z e. ( x i y ) \/ x e.
> ( z i y ) \/ y e. ( x i z ) ) } ) } ) )
>  2 inss1         $p |- ( ( TarskiGC i^i TarskiGB ) i^i ( TarskiGCB i^i { f
> | [. ( Base ` f ) / p ]. [. ( Itv ` f ) / i ]. ( LineG ` f ) = ( x e. p , y
> e. ( p \ { x } ) |-> { z e. p | ( z e. ( x i y ) \/ x e. ( z i y ) \/ y e.
> ( x i z ) ) } ) } ) ) C_ ( TarskiGC i^i TarskiGB )
>  3 inss1         $p |- ( TarskiGC i^i TarskiGB ) C_ TarskiGC
>  4 2,3 sstri     $p |- ( ( TarskiGC i^i TarskiGB ) i^i ( TarskiGCB i^i { f
> | [. ( Base ` f ) / p ]. [. ( Itv ` f ) / i ]. ( LineG ` f ) = ( x e. p , y
> e. ( p \ { x } ) |-> { z e. p | ( z e. ( x i y ) \/ x e. ( z i y ) \/ y e.
> ( x i z ) ) } ) } ) ) C_ TarskiGC
>  5 1,4 eqsstri   $p |- TarskiG C_ TarskiGC
>  6 axtrkg.g      $e |- ( ph -> G e. TarskiG )
>  7 5,6 sseldi    $p |- ( ph -> G e. TarskiGC )
>  8 axtrkg.p      $e |- P = ( Base ` G )
>  9 axtrkg.d      $e |- .- = ( dist ` G )
> 10 axtrkg.i      $e |- I = ( Itv ` G )
> 11 8,9,10 istrkgc  $p |- ( G e. TarskiGC <-> ( G e. _V /\ ( A. x e. P A. y
> e. P ( x .- y ) = ( y .- x ) /\ A. x e. P A. y e. P A. z e. P ( ( x .- y )
> = ( z .- z ) -> x = y ) ) ) )
> 12 11 simprbi    $p |- ( G e. TarskiGC -> ( A. x e. P A. y e. P ( x .- y )
> = ( y .- x ) /\ A. x e. P A. y e. P A. z e. P ( ( x .- y ) = ( z .- z ) ->
> x = y ) ) )
> 13 12 simpld     $p |- ( G e. TarskiGC -> A. x e. P A. y e. P ( x .- y ) =
> ( y .- x ) )
> 14 7,13 syl      $p |- ( ph -> A. x e. P A. y e. P ( x .- y ) = ( y .- x )
> )
> 15 axtgcgrrflx.1  $e |- ( ph -> X e. P )
> 16 axtgcgrrflx.2  $e |- ( ph -> Y e. P )
> 17 oveq1         $p |- ( x = X -> ( x .- y ) = ( X .- y ) )
> 18 oveq2         $p |- ( x = X -> ( y .- x ) = ( y .- X ) )
> 19 17,18 eqeq12d  $p |- ( x = X -> ( ( x .- y ) = ( y .- x ) <-> ( X .- y
> ) = ( y .- X ) ) )
> 20 oveq2         $p |- ( y = Y -> ( X .- y ) = ( X .- Y ) )
> 21 oveq1         $p |- ( y = Y -> ( y .- X ) = ( Y .- X ) )
> 22 20,21 eqeq12d  $p |- ( y = Y -> ( ( X .- y ) = ( y .- X ) <-> ( X .- Y
> ) = ( Y .- X ) ) )
> 23 19,22 rspc2v  $p |- ( ( X e. P /\ Y e. P ) -> ( A. x e. P A. y e. P ( x
> .- y ) = ( y .- x ) -> ( X .- Y ) = ( Y .- X ) ) )
> 24 15,16,23 syl2anc  $p |- ( ph -> ( A. x e. P A. y e. P ( x .- y ) = ( y
> .- x ) -> ( X .- Y ) = ( Y .- X ) ) )
> 25 14,24 mpd     $p |- ( ph -> ( X .- Y ) = ( Y .- X ) )
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Metamath" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/metamath/6025E3C8-6FC7-4934-BD53-A3EBC813048F%40dwheeler.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/CAOoe%3DWLdRVvxzGv1H%3D4U6uNYMgScV3atzD1LB3S9bKya60d7tw%40mail.gmail.com.

Re: [Metamath] Idea: Release ".txt" proofs for set.mm databases ("mpetxt") for AI/ML tools, esp Generative Pre-trained Transformers (GPTs)

Reply via email to