I use html5lib like this:
def sanitize(self, content):
"""
Sanitize the content to avoid XSS and so
"""
import html5lib
from html5lib import sanitizer
p = html5lib.HTMLParser(tokenizer=sanitizer.HTMLSanitizer)
# we need to remove <html><head/><body>...</body></html>
return p.parse(content).toxml()[19:-14]
When I sanitize "<IMG SRC="HTTP://WWW.G.COM/png.png" ALT="g">", there
is only receive "<IMG ALT="g">", the string SRC="HTTP://WWW.G.COM/
png.png" lost!
how should make the string not lost?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"html5lib-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---