LGTM

http://codereview.appspot.com/4559048/diff/11002/src/com/google/caja/plugin/html-sanitizer.js
File src/com/google/caja/plugin/html-sanitizer.js (right):

http://codereview.appspot.com/4559048/diff/11002/src/com/google/caja/plugin/html-sanitizer.js#newcode217
src/com/google/caja/plugin/html-sanitizer.js:217: '(\')[^\']*(\'|$)' +
 // 6, 7 = Single-quoted string
On 2012/03/20 20:16:08, felix8a wrote:
On 2012/03/20 17:26:36, MikeSamuel wrote:
> Do we get any benefit from doing ([\"\'])[\s\S]*?(\4|$) and avoid
having two
> sets of quote groups?

I'm wary of backreferences because some regexp engines don't handle
them well.
I'll add a TODO to look into this.

You're probably right.  I vaguely remember back-references in at least
one version of perl 5 caused the interpreter to globally forego
optimization of any regular expression literals.

http://codereview.appspot.com/4559048/diff/11002/src/com/google/caja/plugin/html-sanitizer.js#newcode325
src/com/google/caja/plugin/html-sanitizer.js:325: // slow case, need to
parse attributes
On 2012/03/20 20:16:08, felix8a wrote:
On 2012/03/20 17:26:36, MikeSamuel wrote:
> Don't need to parse attributes on an end tag.

It's possible for someone to write </p foo="a>">, and if we don't
parse end-tag
attributes here, the result would be sanitized differently.  This
might be a
case that we don't care about, I'll add a TODO.

Now that you mention it, I do remember something about special handling
for

  </select ...>
    <option>...</option>
  </select>

in HTML5.

http://codereview.appspot.com/4559048/

Reply via email to