Re: How to parse html wild?

snej Wed, 13 May 2020 09:00:06 -0700

Browsers have _always_ supported “tag soup” HTML, back to Mosaic and Netscape. 
Unless the content type is XHTML, you cannot expect any sort of valid 
structure. For parsing “wild” HTML, preprocessing through some widely-used 
tidier is probably the best bet, since its interpretation of bad markup is 
hopefully similar to a browser’s.

Re: How to parse html wild?

Reply via email to