As you may know, I actually *use* the HTML5 Sanitizer in my branch of
Instiki.
Recently, I found I was getting inconsistent results. To track down
the problem, I created the following test
{
"name": "quotes_in_attributes",
"input": "<img src='foo' title='\"foo\" bar' />",
"rexml": "<img src='foo' title='\"foo\" bar' />",
"output": "<img title='"foo" bar' src='foo'/>"
}
sanitize_rexml passes the test, but sanitize_html and sanitize_xhtml
fail:
test_quotes_in_attributes(SanitizeTest)
[tests/test_sanitizer.rb:35:in `check_sanitization'
tests/test_sanitizer.rb:134:in `test_quotes_in_attributes']:
<"<img title='"foo" bar' src='foo'/>"> expected but was
<"<img title='&quot;foo" bar' src='foo'/>">.
It turns out that this is easily fixed by the following change in
tokenizer.rb:
# This method replaces the need for
"entityInAttributeValueState".
def process_entity_in_attribute
- entity = consume_entity(true)
+ entity = consume_entity()
if entity
@current_token[:data][-1][1] += entity
If I make this change, all tests (not just the sanitizer tests) pass.
Unfortunately, I don't really understand the tokenizer logic, at this
point. I don't understand why changing "from_attribute=true" to
"from_attribute=false" (the default) fixes the problem. And I don't
understand what will *break* if we make that change. Nothing we
currently test for, apparently, but probably that's due to an
insufficiency of unit tests.
So ...
Somebody please come up with a unit test that will exercise the
"consume_entity(true)" logic and/or come up with a better fix for this
regression.
Or I could just commit this change and let people squeal when
something breaks later...
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"html5lib-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---