LGTM
http://codereview.appspot.com/4559048/diff/11002/src/com/google/caja/plugin/html-sanitizer.js
File src/com/google/caja/plugin/html-sanitizer.js (right):
http://codereview.appspot.com/4559048/diff/11002/src/com/google/caja/plugin/html-sanitizer.js#newcode217
src/com/google/caja/plugin/html-sanitizer.js:217: '(\')[^\']*(\'|$)' +
// 6, 7 = Single-quoted string
On 2012/03/20 20:16:08, felix8a wrote:
On 2012/03/20 17:26:36, MikeSamuel wrote:
> Do we get any benefit from doing ([\"\'])[\s\S]*?(\4|$) and avoid
having two
> sets of quote groups?
I'm wary of backreferences because some regexp engines don't handle
them well.
I'll add a TODO to look into this.
You're probably right. I vaguely remember back-references in at least
one version of perl 5 caused the interpreter to globally forego
optimization of any regular expression literals.
http://codereview.appspot.com/4559048/diff/11002/src/com/google/caja/plugin/html-sanitizer.js#newcode325
src/com/google/caja/plugin/html-sanitizer.js:325: // slow case, need to
parse attributes
On 2012/03/20 20:16:08, felix8a wrote:
On 2012/03/20 17:26:36, MikeSamuel wrote:
> Don't need to parse attributes on an end tag.
It's possible for someone to write </p foo="a>">, and if we don't
parse end-tag
attributes here, the result would be sanitized differently. This
might be a
case that we don't care about, I'll add a TODO.
Now that you mention it, I do remember something about special handling
for
</select ...>
<option>...</option>
</select>
in HTML5.
http://codereview.appspot.com/4559048/