I use html5lib like this:

    def sanitize(self, content):
        """
        Sanitize the content to avoid XSS and so
        """
        import html5lib
        from html5lib import sanitizer
        p = html5lib.HTMLParser(tokenizer=sanitizer.HTMLSanitizer)
        # we need to remove <html><head/><body>...</body></html>
        return p.parse(content).toxml()[19:-14]

When I sanitize  "<IMG SRC="HTTP://WWW.G.COM/png.png" ALT="g">", there
is only receive "<IMG  ALT="g">",  the string  SRC="HTTP://WWW.G.COM/
png.png"  lost!

how should make the string not lost?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
 To post to this group, send email to [email protected]
 To unsubscribe from this group, send email to 
[email protected]
 For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---

Reply via email to