[Metamath] Re: Speculations on how to help machine learning systems generate metamath proofs

Jon P Fri, 31 Mar 2023 03:41:36 -0700

These are nice points David and I agree with you.

1. Yes I agree that backwards chaining might be easier than forwards. 
Partly as you're getting closer to what is already in the database with 
each step.


2. I agree about automatically generating training data to create a lot of 
it. This seems like a relatively simpler project than training an LLM 
directly?

3. In terms of guiding the AI one thing that can help is to ask them to 
first outline the proof and then use that outline to fill in the steps, for 
instance I'll put some chatgpt output for sqrt2irr below. So they can 
reason at higher levels as well as just with details.

4. Completely agree about starting out with pretrained models. Especially 
as some are already using metamath as a training database.

5. Having the AI check it's own work as it goes step by step with a 
verifier would help a lot I think. Without that it would be too easy to get 
a few symbols wrong and then get lost. Great idea.

In terms of the control problem I think that formal reasoning is actually a 
really helpful and positive thing. Imo just having inscrutable deep 
learning systems is what is dangerous, a formally controlled AGI, where you 
could prove its code operated under certain rules, would be much better and 
safer I think. Like "you are allowed to make a smarter copy of yourself 
only if you can prove that it will be governed by the same constraints that 
you are" gives at least some hope of things staying in control over time.

So yeah I think advancing the ability to reason about code and formal 
structures is hugely helpful for alignment, rather than hindering it.

===================

what are the main steps to prove root 2 is irrational?

The proof that the square root of 2 is irrational involves several steps, 
including:

Assume that the square root of 2 is rational and can be expressed as a 
fraction p/q in its lowest terms, where p and q are positive integers.

Square both sides of the equation to obtain 2 = p^2/q^2.

Rearrange the equation to obtain p^2 = 2q^2.

Note that p^2 is an even number, since it is the product of an even number 
(p) and itself. Therefore, p must be even as well.

Substitute 2n for p in the equation, where n is an integer. This gives 
(2n)^2 = 2q^2, or 4n^2 = 2q^2.

Simplify the equation to obtain 2n^2 = q^2. This shows that q^2 is even, 
and therefore q must be even as well.

However, this contradicts our assumption that p/q is in its lowest terms, 
since both p and q are even. Therefore, our initial assumption that the 
square root of 2 is rational must be false.

Conclude that the square root of 2 is irrational, meaning that it cannot be 
expressed as a ratio of two integers.

On Sunday, March 26, 2023 at 12:17:15 AM UTC David A. Wheeler wrote:

> As you know, AI (artificial intelligence) / ML (machine learning)
> systems have *already* generated Metamath proofs.
>
> However, the recent breakthroughs in AI/ML, particularly in
> large language models (LLMs) like GPT-3 & GPT-4 (which is really 
> multimodal)
> make me hope that these newer systems can create (or help create) proofs 
> for the
> Metamath system, such as for the set.mm and iset.mm databases.
> I'm hoping some people reading this email will be doing that!
>
> I've had some experience over the years with AI/ML systems, so I have
> some speculations of how we might help such systems be more likely to
> be successful. Like all speculations, they may not be true; they are
> hypotheses based on educated guesswork. Still, hypotheses are a good
> starting point for experimentation, and I'm hoping that some of them
> will be correct & lead to better proofs.
>
> Some of this thinking was inspired by the paper by Bubeck et al,
> "Sparks of Artificial General Intelligence: Early experiments with GPT-4"
> <https://arxiv.org/abs/2303.12712>. You might to read it and/or see the 
> summary video
> "'Sparks of AGI' - Bombshell GPT-4 Paper: Fully Read w/ 15 Revelations"
> by AI explained <https://www.youtube.com/watch?v=Mqg3aTGNxZ0>
>
>
> --- David A. Wheeler
>
> === Speculations ===
>
> 1. Many of the newer generative models have an odd limitation: they start
> generating text without knowing where they'll end up as a conclusion.
> This is highlighted by Bubeck et al several times as "lack of planning".
>
> One way to address this limitation today would be train the system & have 
> it
> generate metamath proofs "backwards". That is, the first statement is the 
> goal,
> followed by the items supporting it (repeatedly). Many humans develop 
> proofs this
> way anyway (start with what you want, then work backwards).
>
> Of course, this is a general problem, and there may be general solutions.
> "Memory Augmented Large Language Models are Computationally Universal"
> by Dale Schuurmans <https://arxiv.org/abs/2301.04589> shows that
> "transformer-based large language models are computationally universal 
> when augmented with an external memory."
> Even with such augmentations, it may still easier for proof generation 
> tools to
> work "backwards" by default, as that is likely to *tend* to be easier for 
> them.
>
> 2. More training data == better results in general.
> Newer systems & models can learn more quickly, but it's still true that 
> more training
> data tends to produce better results. Therefore, I think it'd be helpful to
> *automatically* generate proofs with a variety of terms to help the system
> learn the underlying rule. For example, set.mm's ax-mp says:
> min ⊢ 𝜑
> maj ⊢ (𝜑 → 𝜓)
> ax-mp ⊢ 𝜓
> ... but it would be easy to generate examples that use the general 
> assertion.
> If you don't care exactly what you're proving, as long as it's true, you 
> can generate
> lots of true statements :-). But this would give the systems more to work 
> with.
>
> Databricks says that their work "suggests that much of the qualitative 
> gains in state-of-the-art models like ChatGPT may owe to focused corpuses 
> of instruction-following training data, rather than larger or better-tuned 
> base models."
> <
> https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html
> >
>
> 3. Critical reasoning lacks an “inner dialogue”. Newer models *are* able 
> to create
> models to answer questions to a limited extent, but they're not great at 
> it, and this
> really hurts their mathematical capabilities. See the Bubeck paper for 
> more.
>
> I think this can be helped by training the system on *not*
> just the sequence of steps, but by training it in a way that shows the 
> *results* of the
> each intermediate step (the text after "|-" in set.mm and iset.mm). This 
> provides the
> models with more information on the internal state, instead of having to 
> figure it out.
>
> 4. Starting with existing models is probably the better step forward.
> Instead of starting from scratch, I'd start with a pre-trained model.
> Those seem to have worked out broader concepts of locality that make 
> results
> more likely to be good.
>
> 5. Supporting tool use might be helpful.
> It might be helpful to have pre-created "tools" that automatically prove 
> common cases.
> One of the more remarkable capabilities of GPT-4 & friends is that they 
> can invoke
> *other* tools. Just like tool use by humans, tool use by LLMs makes them 
> able to handle
> far more complex problems than they could otherwise. Being able to invoke 
> something
> that can generate solutions for tasks it would otherwise struggle with 
> (like a SAT solver
> or numeric calculation proof) might make hard tasks trivial.
>
> 5. In the long run this may be completely doable on a single computer or 
> even phone.
> There are also many efforts to create open models that can be used for any 
> purpose
> (OpenAI doesn't create any open models, its name is rather misleading).
> Really, there's a *lot* of work just *starting* to make these models train 
> and/or run on
> single (disconnected) systems instead of servers. Historically, making 
> things work at *all* usually
> comes first; once something works, others take steps to make them more 
> efficient/cheaper.
> The same is true here. There are lots of discussions about this, e.g.:
> https://news.ycombinator.com/item?id=35141531
> https://news.ycombinator.com/item?id=35184985
>
> https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html
>
> I *am* concerned problems of AI safety and especially AI alignment.
> In the short term, we're going to be overwhelmed by spam & propaganda, 
> making truth
> about the world even harder to determine.
> In the long term, I worry about AI alignment; universal paperclip is now a 
> plausible long-term scenario.
> However, I see no issue in creating tools to generate proofs that are then
> separately verified, so I'd love to see that.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/7ea65035-9b06-46bf-a0c4-1bf2b00d4d4cn%40googlegroups.com.

[Metamath] Re: Speculations on how to help machine learning systems generate metamath proofs

Reply via email to