Re: [Metamath] Future website directions

David A. Wheeler Sun, 26 Feb 2023 08:52:07 -0800

I completely agree, a page on AI/ML would be great. The pages could do with a 
refresh anyway, many places suggest contacting Norm.


Norm used an HTML 1.0 convention of using uppercase tags. I suggest that new 
text use lowercase tags as a style choice
It doesn't matter technically, but it is the current convention, and it would 
make it easier to see which parts are unchanged.

On February 25, 2023 5:33:02 PM EST, Mario Carneiro <[email protected]> wrote:
>A page with short descriptions of papers and projects using Metamath for AI
>would be great to put on the website. I think it would be good publicity
>for both metamath and the AI systems, and maybe we can solicit the paper
>authors to write the descriptions and provide related links as well.
>
>On Sat, Feb 25, 2023 at 5:27 PM Jon P <[email protected]> wrote:
>
>> Re the subject of LLMs being trained on metamath this new model called
>> LLama from facebook is interesting. It seems to be smaller than other
>> systems and it looks like they're already training on metamath.
>>
>> https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
>>
>> It links to this page and this paper which talk about the matheamtical
>> aspects
>>
>> https://ai.facebook.com/blog/ai-math-theorem-proving/
>>
>> https://arxiv.org/abs/2205.11491
>>
>> It's interesting they talk about the "metamath benchmark". Maybe one
>> approach is to take that and have a page which presents the metamath
>> database for training (kind of like the .txt files or json or whatever is
>> preferred) and then has some notes about the papers that have already been
>> published about it (gpt-f, LLama) and their results. Might be interesting.
>>
>> On Sunday, February 19, 2023 at 10:39:55 PM UTC David A. Wheeler wrote:
>>
>>>
>>> > On Sun, Feb 19, 2023 at 2:19 PM David A. Wheeler <[email protected]>
>>> wrote:
>>> > My first choice when building a website, where it's easily achieved, is
>>> static rendering (statically-generated pages) where the most important data
>>> is visible without client-side JavaScript. Those are easily understood,
>>> incredibly hard to attack, trivial to configure, trivial to mirror, support
>>> security/privacy-conscious users, etc. Some sites *cannot* be developed
>>> this way, of course!! But using that as the "starting point" helps clarify
>>> "what else do you need?" - that is, it forces a discussion of "what are the
>>> unspoken requirements?". It's also where we are in the metamath site.
>>> >
>>> > With pre-rendering as a replacement for dynamic server side stuff, we
>>> can still do most of the things. For example we could prerender all the
>>> json queries and use client side JS to request them when doing cross
>>> linking stuff. This is only made difficult if the set of all queries is
>>> infinite, for example if we can do a search query.
>>>
>>>
>>> > On Feb 19, 2023, at 3:13 PM, Mario Carneiro <[email protected]> wrote:
>>> > The main advantages of dynamic rendering are:
>>> > (1) quicker turnaround time, e.g. if we want to hastily put up a mm100
>>> theorem on the website;
>>> > (2) faster iteration - it is possible to make changes to the server and
>>> see the results ~immediately;
>>> > (3) lower storage cost, since we won't have to generate all the HTML
>>> for the web site in advance and just cache important stuff (I assume that
>>> most of the work going into the website build is wasted, since no one looks
>>> at that page before it is next regenerated);
>>> > (4) the ability to have many more rendering styles (unicode, gif, text,
>>> latex, mathml, json have all been considered and more are possible) and
>>> more databases, which is a combinatorial problem for the website build
>>> limited by storage space and generation time.
>>>
>>> Sure. As always, it's all about trade-offs :-).
>>>
>>> I don't know if quicker turnaround time & such is all that important.
>>> A "local" server run (by, say, a tool that called metamath-knife) would
>>> have the same
>>> tooling problems: You'd need to set up additional tools to run the tool
>>> locally.
>>>
>>> Of that list, the ability to have more rendering styles is possibly the
>>> most compelling of that list.
>>> But exactly what rendering styles?
>>>
>>>
>>> > > We can obviously use a program to dynamically generate many of the
>>> pages instead, but what would it need to do?
>>> >
>>> > For the first version, I would aim simply to replicate the behavior /
>>> look and feel of the current metamath web pages. That's what I implemented
>>> in https://github.com/digama0/mm-web-rs . Once we have something that
>>> can potentially replace the functionality of metamath-exe it will be easier
>>> to consider incremental improvements, in an architecture where we can do
>>> fancy things if we want.
>>>
>>> Okay, but what is desirable that is *different* long term? I think there
>>> are plausible good answers,
>>> but it'd be good to have some idea about them.
>>>
>>> > One technique that I use in the MM0 web pages is to use the HTML as the
>>> data source: put all the proofs in a table like they would be normally, but
>>> perhaps not as marked up as they should be. Then postprocess it with JS to
>>> add all the markup and make additional server requests if needed. That
>>> solves the "make the JS-disabled experience palatable" problem, since you
>>> just get the basic version of the page if no JS is running.
>>>
>>> Yup! Sounds like a great approach.
>>>
>>> > The sort of thing that might require additional data is parsing
>>> support; for example with MM0 pages the formulas are printed in ascii but
>>> ideally they would reflow using the layout algorithm, using correct
>>> indentation. Metamath web pages are horrible at this; they get good
>>> per-token rendering but reflowing is garbage so hopefully you just don't
>>> have too-long lines. mmj2 has a nice indenting formatter (actually there
>>> are several algorithms available); wouldn't it be nice if that could be
>>> done in-browser so that it adapts to the viewport width? If the lines are
>>> sent in HTML as token strings, then that means that the JS has to do the
>>> parsing itself so it needs to know the signatures of declarations like
>>> 'wi'.
>>>
>>> An alternative is to provide HTML in a table (or some other grid), along
>>> with something that's simpler to parse but not shown to someone just
>>> looking at the HTML. It could be in an attribute, something with "display:
>>> none", whatever. If there's a client-side renderer being used, I would want
>>> it to be *easy* to write, not make it parse complex HTML. The individual
>>> theorems aren't big, so it'd be pretty easy to include both in one file.
>>>
>>> > Also expensive queries can be hidden behind an expandable drop-down and
>>> a server request. We probably want to keep axioms-used and used-by lists
>>> enabled by default even though they are somewhat expensive (although if the
>>> server has some time to warm up and precompute some data structures then it
>>> can cheaply answer these requests), but there are some other requests along
>>> the same lines, like "how many steps would this proof be if everything was
>>> expanded to axioms" (which is fun if you like to look at numbers in the
>>> range of 10^50), which are accessible to metamath-exe but not shown on the
>>> web page.
>>>
>>> Agreed. You can do the same with pregenerated static files, BTW, where
>>> searches bring you to a *dynamic* program that does the search & provides
>>> the results.
>>>
>>> An alternative to a lengthy warm-up each time would be to precompute some
>>> data structures & answers, then store them in a database (probably using
>>> SQLite, but MySQL or PostgreSQL would work, though perhaps even a simple
>>> key/value store would be enough). Or do the lengthy warm-up only when that
>>> .mm has changed, then store that in a database. Then server restarts can be
>>> fast when the .mm file hasn't changed.
>>>
>>> > > I don't know if we need a JSON format for the .mm file as a whole.
>>> ...
>>> >
>>> > I agree with this. The main use case for JSON is at the per-theorem
>>> level; a JSON for the mm file as a whole would be huge and you would be
>>> better off parsing MM directly at that point, since it would be more
>>> efficient. Some other potential files are table-of-contents like things,
>>> which cover the whole mm file but only represent one 'aspect' of the
>>> theorems. So an actual table of contents (the section name hierarchy), the
>>> theorem pages (i.e. give the statements of theorems 1000-1100), the
>>> definition list, the grammar (syntax axioms). The bare list of all theorem
>>> names, in order, might also be a useful query - I would just give this in
>>> plain text, newline separated. Maybe include the $f/$e/$a/$p
>>> classifications too.
>>>
>>> --- David A. Wheeler
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "Metamath" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/metamath/3fcc0b30-42ad-4bda-9627-73aa59c66fa2n%40googlegroups.com
>> <https://groups.google.com/d/msgid/metamath/3fcc0b30-42ad-4bda-9627-73aa59c66fa2n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>-- 
>You received this message because you are subscribed to a topic in the Google 
>Groups "Metamath" group.
>To unsubscribe from this topic, visit 
>https://groups.google.com/d/topic/metamath/untRQm4N-BI/unsubscribe.
>To unsubscribe from this group and all its topics, send an email to 
>[email protected].
>To view this discussion on the web visit 
>https://groups.google.com/d/msgid/metamath/CAFXXJSvuZvXLJcZHu4_ZsE8Oa2OVKEMkMqNsBLdcwqqdP8p_9A%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/E401E28C-3B6C-4AA5-AB52-F29D292C589C%40dwheeler.com.

Re: [Metamath] Future website directions

Reply via email to