> On 16 May 2022, at 18:50, Pablo Rodriguez via ntg-context > <ntg-context@ntg.nl> wrote: > > On 5/16/22 17:30, Hans van der Meer via ntg-context wrote: >> Can't you use an editor with grep, searching for something like the >> pattern <meta.*^/>? > > Many thanks for your reply, dr. van der Meer. > > If I want to typeset the whole book > (https://seumasjeltzz.github.io/LinguaeGraecaePerSeIllustrata/), I will > have to download and sanitize over 20 HTML files.
Which can be done with a couple of command lines. Xmllint usually does a good job of cleaning up dodgy html input: xmllint --html --xmlout <crappy.html> > <nice.xml> (As good as can be expected from a program, anyway). > It is really a pity that ConTeXt cannot totally ignore any given XML elements. This statement is a little unfair: the problem is exactly that your input is NOT proper XML. If it was proper XML, ConTeXt would not have problems with it. ConTeXt explicitly has the capability to handle XML files, which your input simply is not. In fact, it is sloppy HTML-esque data that modern webbrowsers happen to be able to handle more or less correctly. It is not valid HTML either, because valid HTML has to be valid SGML, which your input clearly is not. That said, Tools like xmllint exist for this stuff. Just write a small batch driver file in some scripting language ((power)shell, lua, python, perl, etc.) to preprocess the HTML stuff into clean XML, and you should be fine. Taco — Taco Hoekwater E: t...@bittext.nl genderfluid (all pronouns) ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________