travis+ml-lang...@subspacefield.org writes: > So Zalewski says that because of parser divergence, the only safe way > to proxy HTML (and do any security filtering) is to parse the HTML via > a method that is widely used by browsers into an AST and reconstruct > the HTML from the AST, ignoring all unknown tags and attributes and > unsafe constructs (using the safest encoding, for example). Many > people start from an assumption that something smaller is sufficient, > but they are wrong.
For the record, an example of such a sanitizer implemented in Python: <http://code.google.com/p/soclone/source/browse/trunk/soclone/utils/html.py> > I presume something similar is useful for HTTP as well, where there's > similar ambiguities in repeated header lines, content-type, encoding, > file attachments, quoted strings, and so on. Repeated header lines in HTTP are only allowed if they can be combined as a legitimate comma-separated list value. See RFC 7320, Section 3.2.2: > A sender MUST NOT generate multiple header fields with the same field > name in a message unless either the entire field value for that > header field is defined as a comma-separated list [i.e., #(values)] > or the header field is a well-known exception (as noted below). <https://tools.ietf.org/html/rfc7230#section-3.2.2> > An interesting paper would be on XML parser engines, how they operate, > and what that means for langsec. Because the parser engine and whether > it works on callbacks or ASTs and so on, combined with the semantics > of parsing, creates divergences. Can you show any examples for XML parser engine differentials? > And oh, if you happen to be in the San Francisco Bay Area, we have a > CFP out (1 Mar 2015 deadline), need volunteers, and sponsors: > > https://bsidessf.com/w/index.php/Main_Page > > Cheers :-) > -- > http://www.subspacefield.org/~travis/ > Split a packed field and I am there; parse a line of text and you will find > me. > > > > > > > _______________________________________________ > langsec-discuss mailing list > langsec-discuss@mail.langsec.org > https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss -- Nils Dagsson Moskopp // erlehmann <http://dieweltistgarnichtso.net>
pgp_FyPa1CZtd.pgp
Description: PGP signature
_______________________________________________ langsec-discuss mailing list langsec-discuss@mail.langsec.org https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss