Hi Neil, Thanks heaps, I'll give this fix a go and let you know how it works ASAP.
That all makes sense re: avoiding breaking changes in guile-lib. If this fix works and is all that's needed, I'll use it instead of the version currently available in guile-lib. With that in mind, if I were to choose one of the 'distributions' of htmlprag, is there one you yourself would pick? - or are the version available in e.g. guile-lib and standalone for all intents and purposes the same? Cheers, Kenan On 6/9/19 8:33 am, Neil Van Dyke wrote: > Kenan, could you please try the below "one-line" change, and let me > know what you think? > > (It's an attempt at a minimal fix for the problem you were seeing, and > for some related problems with modern HTML. However, it breaks > backward-compatibility relative to the htmlprag currently in > guile-lib. For example, consider someone doing Web scraping of modern > HTML, and their scraping code only works with the previous, invalid > parse. I'm not yet familiar with guile-lib and how the htmlprag in it > is being used, so I don't want to be too quick to suggest breaking > changes to it.) > > (Historical note: htmlprag was mostly written 18 years ago, when HTML > was different in both standards and practice. Today, I'd write the > parser very differently, though I think there's a good chance that > htmlprag will still work for one's purpose, with this change.) > > Neil > > --- htmlprag.scm.ORIG 2019-09-05 18:21:40.850220789 -0400 > +++ htmlprag.scm 2019-09-05 18:21:40.850220789 -0400 > @@ -1099,7 +1099,7 @@ > (meta . (head)) > (noframes . (frameset)) > (option . (select)) > - (p . (body td th)) > + (p . (div blockquote body footer header li td th)) > (param . (applet)) > (tbody . (table)) > (td . (tr)) > @@ -1989,6 +1989,13 @@ > (t1 "<script>xxx" '((script "xxx"))) > (t1 "<script/>xxx" '((script) "xxx")) > > + (t1 "<div><p>x</p></div>" '((div (p "x")))) > + (t1 "<header><p>x</p></>" '((header (p "x")))) > + (t1 "<footer><p>x</p></>" '((footer (p "x")))) > + (t1 "<blockquote><p>x</p></blockquote>" '((blockquote (p "x")))) > + (t1 "<ul><li><p>x</p></li></ul>" '((ul (li (p "x"))))) > + (t1 "<ol><li><p>x</p></li></ol>" '((ol (li (p "x"))))) > + > ;; TODO: Add verbatim-pair cases with attributes in the end tag. > > (t2 '(p) "<p></p>") >
signature.asc
Description: OpenPGP digital signature