I completely agree, a page on AI/ML would be great. The pages could do with a refresh anyway, many places suggest contacting Norm.
Norm used an HTML 1.0 convention of using uppercase tags. I suggest that new text use lowercase tags as a style choice It doesn't matter technically, but it is the current convention, and it would make it easier to see which parts are unchanged. On February 25, 2023 5:33:02 PM EST, Mario Carneiro <[email protected]> wrote: >A page with short descriptions of papers and projects using Metamath for AI >would be great to put on the website. I think it would be good publicity >for both metamath and the AI systems, and maybe we can solicit the paper >authors to write the descriptions and provide related links as well. > >On Sat, Feb 25, 2023 at 5:27 PM Jon P <[email protected]> wrote: > >> Re the subject of LLMs being trained on metamath this new model called >> LLama from facebook is interesting. It seems to be smaller than other >> systems and it looks like they're already training on metamath. >> >> https://ai.facebook.com/blog/large-language-model-llama-meta-ai/ >> >> It links to this page and this paper which talk about the matheamtical >> aspects >> >> https://ai.facebook.com/blog/ai-math-theorem-proving/ >> >> https://arxiv.org/abs/2205.11491 >> >> It's interesting they talk about the "metamath benchmark". Maybe one >> approach is to take that and have a page which presents the metamath >> database for training (kind of like the .txt files or json or whatever is >> preferred) and then has some notes about the papers that have already been >> published about it (gpt-f, LLama) and their results. Might be interesting. >> >> On Sunday, February 19, 2023 at 10:39:55 PM UTC David A. Wheeler wrote: >> >>> >>> > On Sun, Feb 19, 2023 at 2:19 PM David A. Wheeler <[email protected]> >>> wrote: >>> > My first choice when building a website, where it's easily achieved, is >>> static rendering (statically-generated pages) where the most important data >>> is visible without client-side JavaScript. Those are easily understood, >>> incredibly hard to attack, trivial to configure, trivial to mirror, support >>> security/privacy-conscious users, etc. Some sites *cannot* be developed >>> this way, of course!! But using that as the "starting point" helps clarify >>> "what else do you need?" - that is, it forces a discussion of "what are the >>> unspoken requirements?". It's also where we are in the metamath site. >>> > >>> > With pre-rendering as a replacement for dynamic server side stuff, we >>> can still do most of the things. For example we could prerender all the >>> json queries and use client side JS to request them when doing cross >>> linking stuff. This is only made difficult if the set of all queries is >>> infinite, for example if we can do a search query. >>> >>> >>> > On Feb 19, 2023, at 3:13 PM, Mario Carneiro <[email protected]> wrote: >>> > The main advantages of dynamic rendering are: >>> > (1) quicker turnaround time, e.g. if we want to hastily put up a mm100 >>> theorem on the website; >>> > (2) faster iteration - it is possible to make changes to the server and >>> see the results ~immediately; >>> > (3) lower storage cost, since we won't have to generate all the HTML >>> for the web site in advance and just cache important stuff (I assume that >>> most of the work going into the website build is wasted, since no one looks >>> at that page before it is next regenerated); >>> > (4) the ability to have many more rendering styles (unicode, gif, text, >>> latex, mathml, json have all been considered and more are possible) and >>> more databases, which is a combinatorial problem for the website build >>> limited by storage space and generation time. >>> >>> Sure. As always, it's all about trade-offs :-). >>> >>> I don't know if quicker turnaround time & such is all that important. >>> A "local" server run (by, say, a tool that called metamath-knife) would >>> have the same >>> tooling problems: You'd need to set up additional tools to run the tool >>> locally. >>> >>> Of that list, the ability to have more rendering styles is possibly the >>> most compelling of that list. >>> But exactly what rendering styles? >>> >>> >>> > > We can obviously use a program to dynamically generate many of the >>> pages instead, but what would it need to do? >>> > >>> > For the first version, I would aim simply to replicate the behavior / >>> look and feel of the current metamath web pages. That's what I implemented >>> in https://github.com/digama0/mm-web-rs . Once we have something that >>> can potentially replace the functionality of metamath-exe it will be easier >>> to consider incremental improvements, in an architecture where we can do >>> fancy things if we want. >>> >>> Okay, but what is desirable that is *different* long term? I think there >>> are plausible good answers, >>> but it'd be good to have some idea about them. >>> >>> > One technique that I use in the MM0 web pages is to use the HTML as the >>> data source: put all the proofs in a table like they would be normally, but >>> perhaps not as marked up as they should be. Then postprocess it with JS to >>> add all the markup and make additional server requests if needed. That >>> solves the "make the JS-disabled experience palatable" problem, since you >>> just get the basic version of the page if no JS is running. >>> >>> Yup! Sounds like a great approach. >>> >>> > The sort of thing that might require additional data is parsing >>> support; for example with MM0 pages the formulas are printed in ascii but >>> ideally they would reflow using the layout algorithm, using correct >>> indentation. Metamath web pages are horrible at this; they get good >>> per-token rendering but reflowing is garbage so hopefully you just don't >>> have too-long lines. mmj2 has a nice indenting formatter (actually there >>> are several algorithms available); wouldn't it be nice if that could be >>> done in-browser so that it adapts to the viewport width? If the lines are >>> sent in HTML as token strings, then that means that the JS has to do the >>> parsing itself so it needs to know the signatures of declarations like >>> 'wi'. >>> >>> An alternative is to provide HTML in a table (or some other grid), along >>> with something that's simpler to parse but not shown to someone just >>> looking at the HTML. It could be in an attribute, something with "display: >>> none", whatever. If there's a client-side renderer being used, I would want >>> it to be *easy* to write, not make it parse complex HTML. The individual >>> theorems aren't big, so it'd be pretty easy to include both in one file. >>> >>> > Also expensive queries can be hidden behind an expandable drop-down and >>> a server request. We probably want to keep axioms-used and used-by lists >>> enabled by default even though they are somewhat expensive (although if the >>> server has some time to warm up and precompute some data structures then it >>> can cheaply answer these requests), but there are some other requests along >>> the same lines, like "how many steps would this proof be if everything was >>> expanded to axioms" (which is fun if you like to look at numbers in the >>> range of 10^50), which are accessible to metamath-exe but not shown on the >>> web page. >>> >>> Agreed. You can do the same with pregenerated static files, BTW, where >>> searches bring you to a *dynamic* program that does the search & provides >>> the results. >>> >>> An alternative to a lengthy warm-up each time would be to precompute some >>> data structures & answers, then store them in a database (probably using >>> SQLite, but MySQL or PostgreSQL would work, though perhaps even a simple >>> key/value store would be enough). Or do the lengthy warm-up only when that >>> .mm has changed, then store that in a database. Then server restarts can be >>> fast when the .mm file hasn't changed. >>> >>> > > I don't know if we need a JSON format for the .mm file as a whole. >>> ... >>> > >>> > I agree with this. The main use case for JSON is at the per-theorem >>> level; a JSON for the mm file as a whole would be huge and you would be >>> better off parsing MM directly at that point, since it would be more >>> efficient. Some other potential files are table-of-contents like things, >>> which cover the whole mm file but only represent one 'aspect' of the >>> theorems. So an actual table of contents (the section name hierarchy), the >>> theorem pages (i.e. give the statements of theorems 1000-1100), the >>> definition list, the grammar (syntax axioms). The bare list of all theorem >>> names, in order, might also be a useful query - I would just give this in >>> plain text, newline separated. Maybe include the $f/$e/$a/$p >>> classifications too. >>> >>> --- David A. Wheeler >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "Metamath" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/metamath/3fcc0b30-42ad-4bda-9627-73aa59c66fa2n%40googlegroups.com >> <https://groups.google.com/d/msgid/metamath/3fcc0b30-42ad-4bda-9627-73aa59c66fa2n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > >-- >You received this message because you are subscribed to a topic in the Google >Groups "Metamath" group. >To unsubscribe from this topic, visit >https://groups.google.com/d/topic/metamath/untRQm4N-BI/unsubscribe. >To unsubscribe from this group and all its topics, send an email to >[email protected]. >To view this discussion on the web visit >https://groups.google.com/d/msgid/metamath/CAFXXJSvuZvXLJcZHu4_ZsE8Oa2OVKEMkMqNsBLdcwqqdP8p_9A%40mail.gmail.com. -- You received this message because you are subscribed to the Google Groups "Metamath" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/E401E28C-3B6C-4AA5-AB52-F29D292C589C%40dwheeler.com.
