Re: [Metamath] Idea: Release ".txt" proofs for set.mm databases ("mpetxt") for AI/ML tools, esp Generative Pre-trained Transformers (GPTs)

Igor Ieskov Tue, 14 Feb 2023 23:09:47 -0800

I used a tool which I recently developed to generate plain text of a proof. 
Currently it generates all steps including floatings, but it could be 
changed to generate only essentials. Also the plain text format may be 
changed and saving in json may be implemented too. One big question is if 
this tool is reliable enough to generate correct output. So just letting 
you know about this alternative approach.


The source code:
https://github.com/expln/metamath-lamp/blob/e82a771eb9f3712e9e8b62e9e133f3e1871dbeda/src/metamath/test/MM_save_proofs_to_files.res#L11

Example output:
--- TBL pm5.74 
---------------------------------------------------------------------------
1: hyp: wph | wff ph
2: hyp: wps | wff ps
3: hyp: wch | wff ch
4: 2, 3 wb | wff ( ps <-> ch )
5: 1, 4 wi | wff ( ph -> ( ps <-> ch ) )
6: 1, 2 wi | wff ( ph -> ps )
7: 1, 3 wi | wff ( ph -> ch )
8: 6, 7 wb | wff ( ( ph -> ps ) <-> ( ph -> ch ) )
9: 2, 3 biimp | |- ( ( ps <-> ch ) -> ( ps -> ch ) )
10: 4, 2, 3, 1, 9 imim3i | |- ( ( ph -> ( ps <-> ch ) ) -> ( ( ph -> ps ) 
-> ( ph -> ch ) ) )
11: 2, 3 biimpr | |- ( ( ps <-> ch ) -> ( ch -> ps ) )
12: 4, 3, 2, 1, 11 imim3i | |- ( ( ph -> ( ps <-> ch ) ) -> ( ( ph -> ch ) 
-> ( ph -> ps ) ) )
13: 5, 6, 7, 10, 12 impbid | |- ( ( ph -> ( ps <-> ch ) ) -> ( ( ph -> ps ) 
<-> ( ph -> ch ) ) )
14: 6, 7 biimp | |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ( ph -> ps ) 
-> ( ph -> ch ) ) )
15: 8, 1, 2, 3, 14 pm2.86d | |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ph 
-> ( ps -> ch ) ) )
16: 6, 7 biimpr | |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ( ph -> ch ) 
-> ( ph -> ps ) ) )
17: 8, 1, 3, 2, 16 pm2.86d | |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ph 
-> ( ch -> ps ) ) )
18: 8, 1, 2, 3, 15, 17 impbidd | |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> 
( ph -> ( ps <-> ch ) ) )
19: 5, 8, 13, 18 impbii | |- ( ( ph -> ( ps <-> ch ) ) <-> ( ( ph -> ps ) 
<-> ( ph -> ch ) ) )
-----------------------------------------------------------------------------------

Igor

On Wednesday, February 15, 2023 at 3:58:00 AM UTC+1 [email protected] wrote:

> I think a json version of the web pages would make a lot more sense than a 
> plain text version, if the intent is to support machine consumption by 
> tools that can't parse metamath already. There are even uses for that in 
> the web pages themselves, since an interactive version of the web pages 
> would like to have the underlying data in a more easily processed form so 
> that it can do things with it. In fact, we could even eliminate the 
> mpeuni/mpegif pages entirely and replace them with a on-the-spot client 
> side renderer which uses the json file as input along with some global 
> formatting data (i.e. a json version of the $t comment), which would 
> significantly decrease the space cost of the web site without changing the 
> user experience at all.
>
> On Tue, Feb 14, 2023 at 9:32 PM Cris Perdue <[email protected]> wrote:
>
>> David,
>>
>> I take this idea as an approach to making metamath proofs more digestible 
>> by machines, and I love it for that.
>>
>> It might be worth mentioning that it is easier than it might appear to 
>> extract the text from a web page as in the Metamath site. In fact a tiny 
>> experiment on my part came out nicer than I hoped.  Opening up the Chrome 
>> debugging console on the page for pm5.74, I did:
>>
>> > console.log(document.body)
>>
>> and got:
>>
>> Metamath Proof Explorer < Previous   Next >
>> Nearby theorems
>> Mirrors  >  Home  >  MPE Home  >  Th. List  >  pm5.74 Structured 
>> version   Visualization version   GIF version
>> Theorem pm5.74 262
>> Description: Distribution of implication over biconditional. Theorem 
>> *5.74 of [WhiteheadRussell] p. 126. (Contributed by NM, 1-Aug-1994.) (Proof 
>> shortened by Wolf Lammen, 11-Apr-2013.)
>> Assertion
>> Ref Expression
>> pm5.74 ⊢ ((𝜑 → (𝜓 ↔ 𝜒)) ↔ ((𝜑 → 𝜓) ↔ (𝜑 → 𝜒)))
>> Proof of Theorem pm5.74
>> Step Hyp Ref Expression
>> 1   biimp 207 . . . 4 ⊢ ((𝜓 ↔ 𝜒) → (𝜓 → 𝜒))
>> 2 1 imim3i 64 . . 3 ⊢ ((𝜑 → (𝜓 ↔ 𝜒)) → ((𝜑 → 𝜓) → (𝜑 → 𝜒)))
>> 3   biimpr 212 . . . 4 ⊢ ((𝜓 ↔ 𝜒) → (𝜒 → 𝜓))
>> 4 3 imim3i 64 . . 3 ⊢ ((𝜑 → (𝜓 ↔ 𝜒)) → ((𝜑 → 𝜒) → (𝜑 → 𝜓)))
>> 5 2, 4 impbid 204 . 2 ⊢ ((𝜑 → (𝜓 ↔ 𝜒)) → ((𝜑 → 𝜓) ↔ (𝜑 → 𝜒)))
>> 6   biimp 207 . . . 4 ⊢ (((𝜑 → 𝜓) ↔ (𝜑 → 𝜒)) → ((𝜑 → 𝜓) → (𝜑 → 
>> 𝜒)))
>> 7 6 pm2.86d 108 . . 3 ⊢ (((𝜑 → 𝜓) ↔ (𝜑 → 𝜒)) → (𝜑 → (𝜓 → 𝜒)))
>> 8   biimpr 212 . . . 4 ⊢ (((𝜑 → 𝜓) ↔ (𝜑 → 𝜒)) → ((𝜑 → 𝜒) → (𝜑 → 
>> 𝜓)))
>> 9 8 pm2.86d 108 . . 3 ⊢ (((𝜑 → 𝜓) ↔ (𝜑 → 𝜒)) → (𝜑 → (𝜒 → 𝜓)))
>> 10 7, 9 impbidd 202 . 2 ⊢ (((𝜑 → 𝜓) ↔ (𝜑 → 𝜒)) → (𝜑 → (𝜓 ↔ 𝜒)))
>> 11 5, 10 impbii 201 1 ⊢ ((𝜑 → (𝜓 ↔ 𝜒)) ↔ ((𝜑 → 𝜓) ↔ (𝜑 → 𝜒)))
>> Colors of variables: wff setvar class
>> Syntax hints:   → wi 4   ↔ wb 198
>> This theorem was proved from axioms:  ax-mp 5  ax-1 6  ax-2 7  ax-3 8
>> This theorem depends on definitions:  df-bi 199
>> This theorem is referenced by:  pm5.74i  263  pm5.74ri  264  pm5.74d  265 
>>  pm5.74rd  266  bibi2d  334  pm5.32  570  orbidi  976
>>   Copyright terms: Public domain W3C validator
>>
>>
>> So perhaps the "plain text" may be accessible enough to systems with 
>> well-developed HTML and DOM processing. Arguably it might be more desirable 
>> to have the ASCII text rather than Unicode in the formulas, as parsing of 
>> the ASCII text is presumably more well-defined.
>>
>> There is something quite similar that I think would be even cooler: 
>> JSON-format versions of theorem pages. JSON is a well-defined textual 
>> format, extremely popular and widespread in use on the Web and elsewhere. 
>> I'm thinking of a JSON-based format something like:
>>
>> {"theorem": "m5.74",
>>  "hyps": [],
>>  "wff": "( ( ph -> ( ps <-> ch ) ) <-> ( ( ph -> ps ) <-> ( ph -> ch ) ) 
>> )",
>>  "proof": [
>>  {"i": 1, "ref": "biimp" "f": "( ( ps <-> ch ) -> ( ps -> ch ) )"},
>>  {"i": 2, "deps": [1], "ref": "imim3i", "f":( ( ph -> ( ps <-> ch ) ) -> 
>> ( ( ph -> ps ) -> ( ph -> ch ) ) )"},
>>  {"i": 3 "ref": "biimpr", "f": "( ( ps <-> ch ) -> ( ch -> ps ) )"},
>>  {"i": 4 "deps": [3], "ref": "imim3i", "f": "( ( ph -> ( ps <-> ch ) ) -> 
>> ( ( ph -> ch ) -> ( ph -> ps ) ) )"},
>>  {"i": 5, "deps": [2,4], "ref": "impbid", "f":  "( ( ph -> ( ps <-> ch ) 
>> ) -> ( ( ph -> ps ) <-> ( ph -> ch ) ) )"},
>>  (and so on)
>>  ]
>> }
>>
>> (The details of course could vary, while still making use of the JSON 
>> format.)
>>
>> Presumably pages in such a format would include enough information such 
>> as distinct variable groups,
>> to enable reproducing actual proofs. It could be used as data for 
>> experimental proof displayers. It would also
>> be very straightforward to build web apps that present proofs in custom 
>> formats within web browsers, as
>> browsers support JSON format very well.
>>
>> I see no reason this kind of format would should not be at least as 
>> accessible to AI as plain text.
>>
>> -Cris
>>
>>
>> On Mon, Feb 13, 2023 at 10:20 AM David A. Wheeler <[email protected]> 
>> wrote:
>>
>>> All: I'm thinking it might be a good idea to create a new "database 
>>> area",
>>> "mpetxt" (to match "mpeuni"/"mpegif"), with many simple text files - 1 
>>> per proof.
>>> The goal would be to make it easier for AI/ML tools to learn from them
>>> so they perhaps could generate potential proofs.
>>>
>>> Current AI (specifically machine learning) tools have made a lot of 
>>> progress.
>>> In particular Generative Pre-trained Transformers (GPTs) like GPT-3 have 
>>> managed
>>> to generate remarkably good or at least useful answers in many cases.
>>> However, they aren't magic. Our compressed database files are hard for 
>>> them to learn
>>> from, and the HTML tables we show are human-readable but also
>>> challenging for a computer to learn from. On the other hand, once 
>>> presented as typical text,
>>> they have internal structures that are similar enough to text and code 
>>> that it's
>>> plausible similar solutions could also work for them.
>>>
>>> It appears that Microsoft is going to invest LOTS of money in scraping 
>>> the web to
>>> learn from, using that to train & generate better results. Google 
>>> appears to be rushing
>>> to do the same, as a countermeasure. They can already generate passable 
>>> code
>>> from prompts in many situations; even when they get it wrong, people 
>>> report the
>>> results are often helpful as ideas & information for helping to solve 
>>> the real problem.
>>>
>>> So I propose generating text files (1/proof) that would make it easier 
>>> for these
>>> tools to potentially learn how to generate Metamath proofs. Even when 
>>> they get it wrong,
>>> they *might* provide useful tips when trying to create a proof. It's not 
>>> clear to me that
>>> the 140K samples from MPE are enough for them to learn... but they 
>>> *might* be, and the
>>> only way to know is to let them try. The idea is that you'd ask one of 
>>> these tools
>>> "Please complete the following: Metamath proof of NAME in database 
>>> set.mm NAME $p ...."
>>> and it would complete it with an answer.
>>>
>>> Below is an example of what those files might look like. I'd basically 
>>> exploit
>>> Metamath's "show statement" and "show proof" and try to strip out 
>>> distractors
>>> that could interfere with machine learning.
>>> It might be wise to make the hypotheses & goal a single long sentence, 
>>> since it's not clear if
>>> users can enter embedded newlines in their prompts, but I haven't done 
>>> that here.
>>> I think it's important that each proof be in its own file, so the ML 
>>> algorithm knows
>>> exactly where they begin and end.
>>>
>>> There are currently 40627 $p statements in set.mm. I'm guessing that 
>>> most will fit within
>>> one 4KiB blocks (storage is allocated by blocks), so this would require 
>>> ~162 MiB of space.
>>> The us.metamath.org website isn't brimming with space, but we can do 
>>> that.
>>>
>>> --- David A. Wheeler
>>>
>>> ========================
>>> Metamath proof of pm5.74 in database set.mm
>>> pm5.74 $p |- ( ( ph -> ( ps <-> ch ) ) <-> ( ( ph -> ps ) <-> ( ph -> ch 
>>> ) ) ) $= ... $.
>>>
>>> PROOF:
>>>  1 biimp         $p |- ( ( ps <-> ch ) -> ( ps -> ch ) )
>>>  2 1 imim3i      $p |- ( ( ph -> ( ps <-> ch ) ) -> ( ( ph -> ps ) -> ( 
>>> ph -> ch ) ) )
>>>  3 biimpr        $p |- ( ( ps <-> ch ) -> ( ch -> ps ) )
>>>  4 3 imim3i      $p |- ( ( ph -> ( ps <-> ch ) ) -> ( ( ph -> ch ) -> ( 
>>> ph -> ps ) ) )
>>>  5 2,4 impbid    $p |- ( ( ph -> ( ps <-> ch ) ) -> ( ( ph -> ps ) <-> ( 
>>> ph -> ch ) ) )
>>>  6 biimp         $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ( ph -> 
>>> ps ) -> ( ph -> ch ) ) )
>>>  7 6 pm2.86d     $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ph -> ( 
>>> ps -> ch ) ) )
>>>  8 biimpr        $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ( ph -> 
>>> ch ) -> ( ph -> ps ) ) )
>>>  9 8 pm2.86d     $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ph -> ( 
>>> ch -> ps ) ) )
>>> 10 7,9 impbidd   $p |- ( ( ( ph -> ps ) <-> ( ph -> ch ) ) -> ( ph -> ( 
>>> ps <-> ch ) ) )
>>> 11 5,10 impbii   $p |- ( ( ph -> ( ps <-> ch ) ) <-> ( ( ph -> ps ) <-> 
>>> ( ph -> ch ) ) )
>>>
>>> ========================
>>> Metamath proof of axtgcgrrflx in database set.mm
>>> axtrkg.p $e |- P = ( Base ` G ) $.
>>> axtrkg.d $e |- .- = ( dist ` G ) $.
>>> axtrkg.i $e |- I = ( Itv ` G ) $.
>>> axtrkg.g $e |- ( ph -> G e. TarskiG ) $.
>>> axtgcgrrflx.1 $e |- ( ph -> X e. P ) $.
>>> axtgcgrrflx.2 $e |- ( ph -> Y e. P ) $.
>>> axtgcgrrflx $p |- ( ph -> ( X .- Y ) = ( Y .- X ) ) $= ... $.
>>>
>>> PROOF:
>>>  1 df-trkg       $a |- TarskiG = ( ( TarskiGC i^i TarskiGB ) i^i ( 
>>> TarskiGCB i^i { f | [. ( Base ` f ) / p ]. [. ( Itv ` f ) / i ]. ( LineG ` 
>>> f ) = ( x e. p , y e. ( p \ { x } ) |-> { z e. p | ( z e. ( x i y ) \/ x e. 
>>> ( z i y ) \/ y e. ( x i z ) ) } ) } ) )
>>>  2 inss1         $p |- ( ( TarskiGC i^i TarskiGB ) i^i ( TarskiGCB i^i { 
>>> f | [. ( Base ` f ) / p ]. [. ( Itv ` f ) / i ]. ( LineG ` f ) = ( x e. p , 
>>> y e. ( p \ { x } ) |-> { z e. p | ( z e. ( x i y ) \/ x e. ( z i y ) \/ y 
>>> e. ( x i z ) ) } ) } ) ) C_ ( TarskiGC i^i TarskiGB )
>>>  3 inss1         $p |- ( TarskiGC i^i TarskiGB ) C_ TarskiGC
>>>  4 2,3 sstri     $p |- ( ( TarskiGC i^i TarskiGB ) i^i ( TarskiGCB i^i { 
>>> f | [. ( Base ` f ) / p ]. [. ( Itv ` f ) / i ]. ( LineG ` f ) = ( x e. p , 
>>> y e. ( p \ { x } ) |-> { z e. p | ( z e. ( x i y ) \/ x e. ( z i y ) \/ y 
>>> e. ( x i z ) ) } ) } ) ) C_ TarskiGC
>>>  5 1,4 eqsstri   $p |- TarskiG C_ TarskiGC
>>>  6 axtrkg.g      $e |- ( ph -> G e. TarskiG )
>>>  7 5,6 sseldi    $p |- ( ph -> G e. TarskiGC )
>>>  8 axtrkg.p      $e |- P = ( Base ` G )
>>>  9 axtrkg.d      $e |- .- = ( dist ` G )
>>> 10 axtrkg.i      $e |- I = ( Itv ` G )
>>> 11 8,9,10 istrkgc  $p |- ( G e. TarskiGC <-> ( G e. _V /\ ( A. x e. P A. 
>>> y e. P ( x .- y ) = ( y .- x ) /\ A. x e. P A. y e. P A. z e. P ( ( x .- y 
>>> ) = ( z .- z ) -> x = y ) ) ) )
>>> 12 11 simprbi    $p |- ( G e. TarskiGC -> ( A. x e. P A. y e. P ( x .- y 
>>> ) = ( y .- x ) /\ A. x e. P A. y e. P A. z e. P ( ( x .- y ) = ( z .- z ) 
>>> -> x = y ) ) )
>>> 13 12 simpld     $p |- ( G e. TarskiGC -> A. x e. P A. y e. P ( x .- y ) 
>>> = ( y .- x ) )
>>> 14 7,13 syl      $p |- ( ph -> A. x e. P A. y e. P ( x .- y ) = ( y .- x 
>>> ) )
>>> 15 axtgcgrrflx.1  $e |- ( ph -> X e. P )
>>> 16 axtgcgrrflx.2  $e |- ( ph -> Y e. P )
>>> 17 oveq1         $p |- ( x = X -> ( x .- y ) = ( X .- y ) )
>>> 18 oveq2         $p |- ( x = X -> ( y .- x ) = ( y .- X ) )
>>> 19 17,18 eqeq12d  $p |- ( x = X -> ( ( x .- y ) = ( y .- x ) <-> ( X .- 
>>> y ) = ( y .- X ) ) )
>>> 20 oveq2         $p |- ( y = Y -> ( X .- y ) = ( X .- Y ) )
>>> 21 oveq1         $p |- ( y = Y -> ( y .- X ) = ( Y .- X ) )
>>> 22 20,21 eqeq12d  $p |- ( y = Y -> ( ( X .- y ) = ( y .- X ) <-> ( X .- 
>>> Y ) = ( Y .- X ) ) )
>>> 23 19,22 rspc2v  $p |- ( ( X e. P /\ Y e. P ) -> ( A. x e. P A. y e. P ( 
>>> x .- y ) = ( y .- x ) -> ( X .- Y ) = ( Y .- X ) ) )
>>> 24 15,16,23 syl2anc  $p |- ( ph -> ( A. x e. P A. y e. P ( x .- y ) = ( 
>>> y .- x ) -> ( X .- Y ) = ( Y .- X ) ) )
>>> 25 14,24 mpd     $p |- ( ph -> ( X .- Y ) = ( Y .- X ) )
>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Metamath" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/metamath/6025E3C8-6FC7-4934-BD53-A3EBC813048F%40dwheeler.com
>>> .
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Metamath" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/metamath/CAOoe%3DWLdRVvxzGv1H%3D4U6uNYMgScV3atzD1LB3S9bKya60d7tw%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/metamath/CAOoe%3DWLdRVvxzGv1H%3D4U6uNYMgScV3atzD1LB3S9bKya60d7tw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/f83df974-d000-4937-b44c-56e7230a11fan%40googlegroups.com.

Re: [Metamath] Idea: Release ".txt" proofs for set.mm databases ("mpetxt") for AI/ML tools, esp Generative Pre-trained Transformers (GPTs)

Reply via email to