Hi Johnathan, I read why you didn't include metamath; but to let me understand your paper in the metamath "environment":
- to keep things simple, assume we had a set.mm with all proofs in normal mode (no compression) - assume you use symbols + labels + metamath reserved keywords (in set.mm) as your tokens (no BPE applied) - assume you got a transformer T from your genetic optimizer (on hyperparameters), at generation i - then, do you fit T's parameters on (let's say) 90% of the proofs in set.mm ? - then, do you measure "learnability" by measuring how good it is at predicting the next token in the 10% out of sample? - would it mean, either the next token in a wff or the next label in a proof? Is this roughly how it would go in the metamath framework? I'm trying to look at it from a Proof Assistant standpoint, where may be the most intriguing part would be to "generate" the next missing label in a proof (where "next" could be "one" of the missing labels) Thanks in advance for your help Glauco -- You received this message because you are subscribed to the Google Groups "Metamath" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/9cc05019-dc5b-4242-9bf6-7af5406b6837n%40googlegroups.com.
