So Zalewski says that because of parser divergence, the only safe way to proxy HTML (and do any security filtering) is to parse the HTML via a method that is widely used by browsers into an AST and reconstruct the HTML from the AST, ignoring all unknown tags and attributes and unsafe constructs (using the safest encoding, for example). Many people start from an assumption that something smaller is sufficient, but they are wrong.
I presume something similar is useful for HTTP as well, where there's similar ambiguities in repeated header lines, content-type, encoding, file attachments, quoted strings, and so on. An interesting paper would be on XML parser engines, how they operate, and what that means for langsec. Because the parser engine and whether it works on callbacks or ASTs and so on, combined with the semantics of parsing, creates divergences. And oh, if you happen to be in the San Francisco Bay Area, we have a CFP out (1 Mar 2015 deadline), need volunteers, and sponsors: https://bsidessf.com/w/index.php/Main_Page Cheers :-) -- http://www.subspacefield.org/~travis/ Split a packed field and I am there; parse a line of text and you will find me.
pgpNovM3dAFIF.pgp
Description: PGP signature
_______________________________________________ langsec-discuss mailing list langsec-discuss@mail.langsec.org https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss