> On Feb 18, 2023, at 4:08 PM, Mario Carneiro <[email protected]> wrote:
> In case it wasn't clear (and it probably wasn't), in a future world in which
> the default mechanism is some super fancy client side renderer it is still
> possible to offer a fallback "static site" of HTML pages which is
> nevertheless not generated in advance and is being served dynamically
> server-side. This has a relatively small generation cost even if we don't
> stick a CDN in front of it, but if we do then I think that's a viable
> approach to having all the HTML pages with basically zero preprocessing, just
> a smarter server. I would really love it if the only thing the site build
> needs to do is download and load the latest set.mm .
I understand, that's just a typical dynamically-generated data file and/or HTML
with possibly front-end JavaScript. Bog standard stuff. Of course, there are
pluses & minuses with that approach. In the 2010s almost everyone switched to
that model, today there's a big press back to statically-regenerated sites
where practical (with Jekyll, etc.) because attackers routinely pop systems
that do dynamic response.
There are big concerns if you hook up a program written in C to input sent by
an attacker. Rigorous input filtering would definitely help, but using
something like metamath-knife as a basis instead of metamath-exe might be
better. So doing this would involve a nontrivial amount of software, *along*
with analysis to ensure that it was unlikely to lead to a vulnerability.
Statically generating files is a much simpler path from where we are. That's
not to say we must go that way, but it'd take more effort to do full dynamic
generation *with* confidence in its security.
> I don't think this is a very competitive option though, since you would have
> to load all of set.mm.
It's more competitive than you might think. Igor's prover loads the whole
database in a few seconds. That said, I agree with:
> The advantage of pre-processed json files is that when you have a single
> theorem you want to look at you don't need to request everything and start up
> a whole client-side MM verifier first.
> What started this thread was trying to see if we could get "free" training on
> GPT-3 by presenting something easier for them to consume, *without*
> controlling them. I don't think generating JSON will do that. The JSON
> certainly could make it easier for organizations who want to specially train
> ML systems on the existing databases, and are willing to do "a little
> cleaning" by processing the JSON. If we pregenerate JSON from the .mm
> databases, the most likely use cases would be to make it easier for rendering
> tools & ML labs that want to do a little (but not a lot) of work.
>
> Has anyone asked for this?
Absolutely not!! I was noticing that Microsoft & Google are really doubling
down on web scanning into GPTs, and trying to see if we could take advantage of
that.
--- David A. Wheeler
--
You received this message because you are subscribed to the Google Groups
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/metamath/3E04D159-5A99-485B-8B1E-4C8524CD5D0E%40dwheeler.com.