FYI: Our discussion topic seems to have broadened to discussing potential
future directions for the website, so I've changed the subject line to "Future
website directions". Its subject line was previously: "Re: [Metamath] Idea:
Release ".txt" proofs for set.mm databases ("mpetxt") for AI/ML tools, esp
Generative Pre-trained Transformers (GPTs)".
> On Feb 19, 2023, at 1:05 PM, Mario Carneiro <[email protected]> wrote:
>
> I don't see those as competing goals at all. They play different roles: for
> browsing you would ideally have some server supplying only the required
> information to render a specific proof, and for a proof assistant (and to
> some extent a search tool as well) you want to have the whole database
> available, or at least all the signatures of all the theorems and the body
> for the current theorem.
> .... when someone clicks the "start proof mode" button they will have an
> expectation that there will be a startup cost, so it is easy to justify
> loading set.mm and a streaming parser still helps for making that experience
> better.
100% agree on that. "See one proof" is one use case, "get 'whole' database"
(e.g., to create a proof) is another.
> While yes I agree that this would be much better done with metamath-knife
> than metamath-exe, I don't think there is a major concern here, or at least
> we can mostly mitigate the issues since this flow is extremely light on input
> from the attacker. Basically the only thing you can specify is the theorem
> you want to see. But for sure I'm not interested in doing this by building on
> metamath-exe, there are some great frameworks for writing these kind of
> servers in rust that should already handle most of the issues. The main thing
> to be concerned about is DOS attacks, and having a 'real' server like nginx
> in front helps with that.
>
> But our threat model is pretty much nonexistent to begin with; almost all the
> server data and code is public domain in the first place and even if the
> server is taken down it's not a critical service. I think a much more
> critical problem for the project is giving a perception of being an old
> unfriendly software used by few.
It's not a *critical* service absolutely, but there are a lot of attackers who
just attack everything using a variety of tools, then see what they can get out
of it. So I'd rather make it hard to successfully attack the site. I agree that
strong filtering would eliminate many problems.
My first choice when building a website, where it's easily achieved, is static
rendering (statically-generated pages) where the most important data is visible
without client-side JavaScript. Those are easily understood, incredibly hard to
attack, trivial to configure, trivial to mirror, support
security/privacy-conscious users, etc. Some sites *cannot* be developed this
way, of course!! But using that as the "starting point" helps clarify "what
else do you need?" - that is, it forces a discussion of "what are the unspoken
requirements?". It's also where we are in the metamath site.
A disadvantage fo static rendering is that you have to prerender all pages,
which takes time. I see that as a non-problem for our case; if someone can't
wait a few hours, they can use tools to locally generate the page themselves.
We can obviously use a program to dynamically generate many of the pages
instead, but what would it need to do?
> I expect to have a proposal for a JSON format - just a parse tree of a .mm
> file, really - early next month if not sooner.
I think the more interesting case is a JSON format for a *single* proof.
For a specific database, perhaps the website needs to provide:
* The .mm database.
* Introductory HTML pages (explaining what's going on).
* At least one HTML display for each statement (theorem/axiom).
* A JSON format for each statement, to make it easy for renderers to grab that
data & generate fancy displays. That implies the JSON should probably include
all the data in the current HTML generate page, including data about related
theorems & an expansion of every proof step.
* Auto-generated table-of-contents (HTML & JSON?)
I think it's important that at least 1 HTML format show "the basics" if
JavaScript isn't enabled. This is easily done by having progressive rendering
*or* an alternate HTML page. An alternate "fancy" HTML page could download the
JSON format and do many fancy things, without having to parse the .mm file.
I don't know if we need a JSON format for the .mm file as a whole. One approach
for doing that would be to generate a JSON file that in turn contains the
contents of every per-theorem JSON file. That would be a really big file, even
precompressed - does anyone *want* that? What do they plan to do with it? We
already compress the .mm file because they've gotten big. One possible use
would be training AI systems, but they'd need to do post-processing of that to
make it useful for them anyway. My earlier proposal involved trying to create a
format pre-processed to make it easier for them to read, because if they're
specially training an ML system, some preprocessing usually gets better results.
--- David A. Wheeler
--
You received this message because you are subscribed to the Google Groups
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/metamath/726640EF-94CA-4BD8-A52A-E7238C2331F2%40dwheeler.com.