Re: [Metamath] Future website directions

Mario Carneiro Sat, 25 Feb 2023 14:33:18 -0800

A page with short descriptions of papers and projects using Metamath for AI
would be great to put on the website. I think it would be good publicity
for both metamath and the AI systems, and maybe we can solicit the paper
authors to write the descriptions and provide related links as well.


On Sat, Feb 25, 2023 at 5:27 PM Jon P <[email protected]> wrote:

> Re the subject of LLMs being trained on metamath this new model called
> LLama from facebook is interesting. It seems to be smaller than other
> systems and it looks like they're already training on metamath.
>
> https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
>
> It links to this page and this paper which talk about the matheamtical
> aspects
>
> https://ai.facebook.com/blog/ai-math-theorem-proving/
>
> https://arxiv.org/abs/2205.11491
>
> It's interesting they talk about the "metamath benchmark". Maybe one
> approach is to take that and have a page which presents the metamath
> database for training (kind of like the .txt files or json or whatever is
> preferred) and then has some notes about the papers that have already been
> published about it (gpt-f, LLama) and their results. Might be interesting.
>
> On Sunday, February 19, 2023 at 10:39:55 PM UTC David A. Wheeler wrote:
>
>>
>> > On Sun, Feb 19, 2023 at 2:19 PM David A. Wheeler <[email protected]>
>> wrote:
>> > My first choice when building a website, where it's easily achieved, is
>> static rendering (statically-generated pages) where the most important data
>> is visible without client-side JavaScript. Those are easily understood,
>> incredibly hard to attack, trivial to configure, trivial to mirror, support
>> security/privacy-conscious users, etc. Some sites *cannot* be developed
>> this way, of course!! But using that as the "starting point" helps clarify
>> "what else do you need?" - that is, it forces a discussion of "what are the
>> unspoken requirements?". It's also where we are in the metamath site.
>> >
>> > With pre-rendering as a replacement for dynamic server side stuff, we
>> can still do most of the things. For example we could prerender all the
>> json queries and use client side JS to request them when doing cross
>> linking stuff. This is only made difficult if the set of all queries is
>> infinite, for example if we can do a search query.
>>
>>
>> > On Feb 19, 2023, at 3:13 PM, Mario Carneiro <[email protected]> wrote:
>> > The main advantages of dynamic rendering are:
>> > (1) quicker turnaround time, e.g. if we want to hastily put up a mm100
>> theorem on the website;
>> > (2) faster iteration - it is possible to make changes to the server and
>> see the results ~immediately;
>> > (3) lower storage cost, since we won't have to generate all the HTML
>> for the web site in advance and just cache important stuff (I assume that
>> most of the work going into the website build is wasted, since no one looks
>> at that page before it is next regenerated);
>> > (4) the ability to have many more rendering styles (unicode, gif, text,
>> latex, mathml, json have all been considered and more are possible) and
>> more databases, which is a combinatorial problem for the website build
>> limited by storage space and generation time.
>>
>> Sure. As always, it's all about trade-offs :-).
>>
>> I don't know if quicker turnaround time & such is all that important.
>> A "local" server run (by, say, a tool that called metamath-knife) would
>> have the same
>> tooling problems: You'd need to set up additional tools to run the tool
>> locally.
>>
>> Of that list, the ability to have more rendering styles is possibly the
>> most compelling of that list.
>> But exactly what rendering styles?
>>
>>
>> > > We can obviously use a program to dynamically generate many of the
>> pages instead, but what would it need to do?
>> >
>> > For the first version, I would aim simply to replicate the behavior /
>> look and feel of the current metamath web pages. That's what I implemented
>> in https://github.com/digama0/mm-web-rs . Once we have something that
>> can potentially replace the functionality of metamath-exe it will be easier
>> to consider incremental improvements, in an architecture where we can do
>> fancy things if we want.
>>
>> Okay, but what is desirable that is *different* long term? I think there
>> are plausible good answers,
>> but it'd be good to have some idea about them.
>>
>> > One technique that I use in the MM0 web pages is to use the HTML as the
>> data source: put all the proofs in a table like they would be normally, but
>> perhaps not as marked up as they should be. Then postprocess it with JS to
>> add all the markup and make additional server requests if needed. That
>> solves the "make the JS-disabled experience palatable" problem, since you
>> just get the basic version of the page if no JS is running.
>>
>> Yup! Sounds like a great approach.
>>
>> > The sort of thing that might require additional data is parsing
>> support; for example with MM0 pages the formulas are printed in ascii but
>> ideally they would reflow using the layout algorithm, using correct
>> indentation. Metamath web pages are horrible at this; they get good
>> per-token rendering but reflowing is garbage so hopefully you just don't
>> have too-long lines. mmj2 has a nice indenting formatter (actually there
>> are several algorithms available); wouldn't it be nice if that could be
>> done in-browser so that it adapts to the viewport width? If the lines are
>> sent in HTML as token strings, then that means that the JS has to do the
>> parsing itself so it needs to know the signatures of declarations like
>> 'wi'.
>>
>> An alternative is to provide HTML in a table (or some other grid), along
>> with something that's simpler to parse but not shown to someone just
>> looking at the HTML. It could be in an attribute, something with "display:
>> none", whatever. If there's a client-side renderer being used, I would want
>> it to be *easy* to write, not make it parse complex HTML. The individual
>> theorems aren't big, so it'd be pretty easy to include both in one file.
>>
>> > Also expensive queries can be hidden behind an expandable drop-down and
>> a server request. We probably want to keep axioms-used and used-by lists
>> enabled by default even though they are somewhat expensive (although if the
>> server has some time to warm up and precompute some data structures then it
>> can cheaply answer these requests), but there are some other requests along
>> the same lines, like "how many steps would this proof be if everything was
>> expanded to axioms" (which is fun if you like to look at numbers in the
>> range of 10^50), which are accessible to metamath-exe but not shown on the
>> web page.
>>
>> Agreed. You can do the same with pregenerated static files, BTW, where
>> searches bring you to a *dynamic* program that does the search & provides
>> the results.
>>
>> An alternative to a lengthy warm-up each time would be to precompute some
>> data structures & answers, then store them in a database (probably using
>> SQLite, but MySQL or PostgreSQL would work, though perhaps even a simple
>> key/value store would be enough). Or do the lengthy warm-up only when that
>> .mm has changed, then store that in a database. Then server restarts can be
>> fast when the .mm file hasn't changed.
>>
>> > > I don't know if we need a JSON format for the .mm file as a whole.
>> ...
>> >
>> > I agree with this. The main use case for JSON is at the per-theorem
>> level; a JSON for the mm file as a whole would be huge and you would be
>> better off parsing MM directly at that point, since it would be more
>> efficient. Some other potential files are table-of-contents like things,
>> which cover the whole mm file but only represent one 'aspect' of the
>> theorems. So an actual table of contents (the section name hierarchy), the
>> theorem pages (i.e. give the statements of theorems 1000-1100), the
>> definition list, the grammar (syntax axioms). The bare list of all theorem
>> names, in order, might also be a useful query - I would just give this in
>> plain text, newline separated. Maybe include the $f/$e/$a/$p
>> classifications too.
>>
>> --- David A. Wheeler
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Metamath" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/metamath/3fcc0b30-42ad-4bda-9627-73aa59c66fa2n%40googlegroups.com
> <https://groups.google.com/d/msgid/metamath/3fcc0b30-42ad-4bda-9627-73aa59c66fa2n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/CAFXXJSvuZvXLJcZHu4_ZsE8Oa2OVKEMkMqNsBLdcwqqdP8p_9A%40mail.gmail.com.

Re: [Metamath] Future website directions

Reply via email to