As you may know, I actually *use* the HTML5 Sanitizer in my branch of
Instiki.

Recently, I found I was getting inconsistent results. To track down
the problem, I created the following test

  {
    "name": "quotes_in_attributes",
    "input": "<img src='foo' title='\"foo\" bar' />",
    "rexml": "<img src='foo' title='\"foo\" bar' />",
    "output": "<img title='&quot;foo&quot; bar' src='foo'/>"
  }

sanitize_rexml passes the test, but sanitize_html and sanitize_xhtml
fail:

test_quotes_in_attributes(SanitizeTest)
    [tests/test_sanitizer.rb:35:in `check_sanitization'
     tests/test_sanitizer.rb:134:in `test_quotes_in_attributes']:
<"<img title='&quot;foo&quot; bar' src='foo'/>"> expected but was
<"<img title='&amp;quot;foo&quot; bar' src='foo'/>">.

It turns out that this is easily fixed by the following change in
tokenizer.rb:

     # This method replaces the need for
"entityInAttributeValueState".
     def process_entity_in_attribute
-       entity = consume_entity(true)
+      entity = consume_entity()
       if entity
         @current_token[:data][-1][1] += entity

If I make this change, all tests (not just the sanitizer tests) pass.

Unfortunately, I don't really understand the tokenizer logic, at this
point. I don't understand why changing "from_attribute=true" to
"from_attribute=false" (the default) fixes the problem. And I don't
understand what will *break* if we make that change. Nothing we
currently test for, apparently, but probably that's due to an
insufficiency of unit tests.

So ...

Somebody please come up with a unit test that will exercise the
"consume_entity(true)" logic and/or come up with a better fix for this
regression.

Or I could just commit this change and let people squeal when
something breaks later...


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
 To post to this group, send email to [email protected]
 To unsubscribe from this group, send email to [EMAIL PROTECTED]
 For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---

Reply via email to