On Sun, Feb 19, 2023 at 2:19 PM David A. Wheeler <[email protected]> wrote:
> > While yes I agree that this would be much better done with > metamath-knife than metamath-exe, I don't think there is a major concern > here, or at least we can mostly mitigate the issues since this flow is > extremely light on input from the attacker. Basically the only thing you > can specify is the theorem you want to see. But for sure I'm not interested > in doing this by building on metamath-exe, there are some great frameworks > for writing these kind of servers in rust that should already handle most > of the issues. The main thing to be concerned about is DOS attacks, and > having a 'real' server like nginx in front helps with that. > > > > But our threat model is pretty much nonexistent to begin with; almost > all the server data and code is public domain in the first place and even > if the server is taken down it's not a critical service. I think a much > more critical problem for the project is giving a perception of being an > old unfriendly software used by few. > > It's not a *critical* service absolutely, but there are a lot of attackers > who just attack everything using a variety of tools, then see what they can > get out of it. So I'd rather make it hard to successfully attack the site. > I agree that strong filtering would eliminate many problems. > Obviously that wasn't to downplay the necessity for not putting buggy servers on the internet. We should certainly take all reasonable precautions. But it is important to keep in mind what the threat model is when talking about security concerns, and ours is about as low as one can be for something that is on the internet. > My first choice when building a website, where it's easily achieved, is > static rendering (statically-generated pages) where the most important data > is visible without client-side JavaScript. Those are easily understood, > incredibly hard to attack, trivial to configure, trivial to mirror, support > security/privacy-conscious users, etc. Some sites *cannot* be developed > this way, of course!! But using that as the "starting point" helps clarify > "what else do you need?" - that is, it forces a discussion of "what are the > unspoken requirements?". It's also where we are in the metamath site. > With pre-rendering as a replacement for dynamic server side stuff, we can still do most of the things. For example we could prerender all the json queries and use client side JS to request them when doing cross linking stuff. This is only made difficult if the set of all queries is infinite, for example if we can do a search query. The main advantages of dynamic rendering are: (1) quicker turnaround time, e.g. if we want to hastily put up a mm100 theorem on the website; (2) faster iteration - it is possible to make changes to the server and see the results ~immediately; (3) lower storage cost, since we won't have to generate all the HTML for the web site in advance and just cache important stuff (I assume that most of the work going into the website build is wasted, since no one looks at that page before it is next regenerated); (4) the ability to have many more rendering styles (unicode, gif, text, latex, mathml, json have all been considered and more are possible) and more databases, which is a combinatorial problem for the website build limited by storage space and generation time. > A disadvantage fo static rendering is that you have to prerender all > pages, which takes time. I see that as a non-problem for our case; if > someone can't wait a few hours, they can use tools to locally generate the > page themselves. > For most web site readers, there is a HUGE gap between the effort required to click on a link and the effort required to download the tool, figure out how to make it work, and generate the page they want. I have a working setup and even I would not bother to go to the trouble in most cases. (Even when I do, the results often have broken links or missing GIF symbols because I didn't do the full build or don't have the rest of the web site directory hierarchy set up.) A tool which can act as a *local* server would be a measurable improvement over piecemeal HTML generation, since you can give the impression of a fully built website without having to wait two hours first. > We can obviously use a program to dynamically generate many of the pages > instead, but what would it need to do? > For the first version, I would aim simply to replicate the behavior / look and feel of the current metamath web pages. That's what I implemented in https://github.com/digama0/mm-web-rs . Once we have something that can potentially replace the functionality of metamath-exe it will be easier to consider incremental improvements, in an architecture where we can do fancy things if we want. > > I expect to have a proposal for a JSON format - just a parse tree of a > .mm file, really - early next month if not sooner. > > I think the more interesting case is a JSON format for a *single* proof. > > For a specific database, perhaps the website needs to provide: > * The .mm database. > * Introductory HTML pages (explaining what's going on). > * At least one HTML display for each statement (theorem/axiom). > * A JSON format for each statement, to make it easy for renderers to grab > that data & generate fancy displays. That implies the JSON should probably > include all the data in the current HTML generate page, including data > about related theorems & an expansion of every proof step. > * Auto-generated table-of-contents (HTML & JSON?) > > I think it's important that at least 1 HTML format show "the basics" if > JavaScript isn't enabled. This is easily done by having progressive > rendering *or* an alternate HTML page. An alternate "fancy" HTML page could > download the JSON format and do many fancy things, without having to parse > the .mm file. > One technique that I use in the MM0 web pages is to use the HTML as the data source: put all the proofs in a table like they would be normally, but perhaps not as marked up as they should be. Then postprocess it with JS to add all the markup and make additional server requests if needed. That solves the "make the JS-disabled experience palatable" problem, since you just get the basic version of the page if no JS is running. The sort of thing that might require additional data is parsing support; for example with MM0 pages the formulas are printed in ascii but ideally they would reflow using the layout algorithm, using correct indentation. Metamath web pages are horrible at this; they get good per-token rendering but reflowing is garbage so hopefully you just don't have too-long lines. mmj2 has a nice indenting formatter (actually there are several algorithms available); wouldn't it be nice if that could be done in-browser so that it adapts to the viewport width? If the lines are sent in HTML as token strings, then that means that the JS has to do the parsing itself so it needs to know the signatures of declarations like 'wi'. Also expensive queries can be hidden behind an expandable drop-down and a server request. We probably want to keep axioms-used and used-by lists enabled by default even though they are somewhat expensive (although if the server has some time to warm up and precompute some data structures then it can cheaply answer these requests), but there are some other requests along the same lines, like "how many steps would this proof be if everything was expanded to axioms" (which is fun if you like to look at numbers in the range of 10^50), which are accessible to metamath-exe but not shown on the web page. > I don't know if we need a JSON format for the .mm file as a whole. One > approach for doing that would be to generate a JSON file that in turn > contains the contents of every per-theorem JSON file. That would be a > really big file, even precompressed - does anyone *want* that? What do they > plan to do with it? We already compress the .mm file because they've gotten > big. One possible use would be training AI systems, but they'd need to do > post-processing of that to make it useful for them anyway. My earlier > proposal involved trying to create a format pre-processed to make it easier > for them to read, because if they're specially training an ML system, some > preprocessing usually gets better results. > I agree with this. The main use case for JSON is at the per-theorem level; a JSON for the mm file as a whole would be huge and you would be better off parsing MM directly at that point, since it would be more efficient. Some other potential files are table-of-contents like things, which cover the whole mm file but only represent one 'aspect' of the theorems. So an actual table of contents (the section name hierarchy), the theorem pages (i.e. give the statements of theorems 1000-1100), the definition list, the grammar (syntax axioms). The bare list of all theorem names, in order, might also be a useful query - I would just give this in plain text, newline separated. Maybe include the $f/$e/$a/$p classifications too. -- You received this message because you are subscribed to the Google Groups "Metamath" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/CAFXXJSsVyS-v%3Dey05bZgO4OYmDuSjS-BWjKjV2225imiNVZy1A%40mail.gmail.com.
