[Caja] Re: faster html-sanitizer.js (issue 4559048)

mikesamuel Thu, 22 Mar 2012 16:11:43 -0700

LGTM


http://codereview.appspot.com/4559048/diff/11002/src/com/google/caja/plugin/html-sanitizer.js
File src/com/google/caja/plugin/html-sanitizer.js (right):

http://codereview.appspot.com/4559048/diff/11002/src/com/google/caja/plugin/html-sanitizer.js#newcode217
src/com/google/caja/plugin/html-sanitizer.js:217: '(\')[^\']*(\'|$)' +
 // 6, 7 = Single-quoted string
On 2012/03/20 20:16:08, felix8a wrote:

On 2012/03/20 17:26:36, MikeSamuel wrote:
> Do we get any benefit from doing ([\"\'])[\s\S]*?(\4|$) and avoid

having two

> sets of quote groups?

I'm wary of backreferences because some regexp engines don't handle

them well.

I'll add a TODO to look into this.


You're probably right.  I vaguely remember back-references in at least
one version of perl 5 caused the interpreter to globally forego
optimization of any regular expression literals.

http://codereview.appspot.com/4559048/diff/11002/src/com/google/caja/plugin/html-sanitizer.js#newcode325
src/com/google/caja/plugin/html-sanitizer.js:325: // slow case, need to
parse attributes
On 2012/03/20 20:16:08, felix8a wrote:

On 2012/03/20 17:26:36, MikeSamuel wrote:
> Don't need to parse attributes on an end tag.

It's possible for someone to write </p foo="a>">, and if we don't

parse end-tag

attributes here, the result would be sanitized differently.  This

might be a

case that we don't care about, I'll add a TODO.


Now that you mention it, I do remember something about special handling
for

  </select ...>
    <option>...</option>
  </select>

in HTML5.

http://codereview.appspot.com/4559048/

[Caja] Re: faster html-sanitizer.js (issue 4559048)

Reply via email to