Browsers have _always_ supported “tag soup” HTML, back to Mosaic and Netscape. Unless the content type is XHTML, you cannot expect any sort of valid structure. For parsing “wild” HTML, preprocessing through some widely-used tidier is probably the best bet, since its interpretation of bad markup is hopefully similar to a browser’s.
- Re: How to parse html wild? bung
- Re: How to parse html wild? snej
- Re: How to parse html wild? JohnCarter
