Re: [Metamath] Idea: Release ".txt" proofs for set.mm databases ("mpetxt") for AI/ML tools, esp Generative Pre-trained Transformers (GPTs)

David A. Wheeler Wed, 15 Feb 2023 16:59:23 -0800

> On Feb 14, 2023, at 1:46 PM, Jon P <[email protected]> wrote:
> 
> I think that's a great idea David! Seems like very low downside and it would 
> be interesting to see if language models could train on the datasets. 
> 
> There's already an option to Customise GPT3 with additional datasets so once 
> this is ready it could be an interesting experiment to try.
> 
> https://openai.com/blog/customized-gpt-3/

Thanks so much for the link! That *is* interesting. That links to some tips:
https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset
Among their tips:
• "Aim for at least ~500 examples".
We certainly have that!!
• "Ensure that the prompt + completion doesn't exceed 2048 tokens, including 
the separator".
That will exclude some of our proofs. However, if I use axtgcgrrflx as an 
example,
its goal (with 6 hypotheses) and 25-step proof only needs 942 words if I filter
it the way I proposed (and I could shave off a few more words).

So I guess we don't need to wait for someone to scan the web.
We will still need to figure out how to rig the examples to maximize the 
training.
That's way more black art than science. A few links:
https://www.leewayhertz.com/build-a-gpt-model/
https://research.aimultiple.com/large-language-model-training/
https://www.datacamp.com/blog/a-beginners-guide-to-gpt-3

If you're not familiar with large language models (LLMs), a helpful explanation 
is
"Talking About Large Language Models" <https://arxiv.org/abs/2212.03551>.
Fundamentally, these systems look at previous words & statistically
predict likely continuations; with large datasets & parameters, sometimes they 
can do that well.

OpenAI makes it easy to start with their huge GPT-3 model, though it costs money
(I can't get any free credits because I already used some for work).
Kaggle.com lets you run smaller models for free, but smaller models tend to be 
much less capable.

If anyone has thoughts about applying ML generative text models to Metamath 
proofs
I'd love to hear them.

--- David A. Wheeler

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/962ADA90-5DBE-40BD-BEC5-F38C00369E17%40dwheeler.com.

Re: [Metamath] Idea: Release ".txt" proofs for set.mm databases ("mpetxt") for AI/ML tools, esp Generative Pre-trained Transformers (GPTs)

Reply via email to