I really like the interface/generator idea but not really the
implementation (XML parsing –even though I could live with this–,
generating XML –the HTML5 serialization algorithm [1] is fortunately
easy enough and interoperable–)

The HtmlSanitizer is a good idea, but the implementation is very weak
[2].

There's a very good HTML5 parser implemented in Java at validator.nu,
which is the exact same code as the HTML5 parser in Firefox (C++ code
generated from the Java code, AFAIK).
It can even be used client-side with GWT but it'd probably be overweight
for the common use-case of HTML sanitization, for which I actually don't
think there's a clean and safe way apart from browsers' own safeHtml
implementations [3]; the validator.nu parser could at least be used
(apart from compile-time template parsing) for server-side sanitization;
and client-side sanitization would benefit from a
"toStaticHTML+DOM-scraping" implementation for IE8+ (pass the HTML to
toStaticHTML, then give it to some innerHTML, possibly in an iframe for
sandboxing, and then filter/sanitize again the parsed DOM if needed,
before using innerHTML to retrieve the sanitized HTML)

In brief:
 - SafeHtml/SafeHtmlBuilder and EscapeUtils are OK
 - SafeHtmlTemplates could be improved, but is acceptable as-is as a
first version
 - HtmlSanitizer is probably not worth it in its current state

My 2c.

[1] http://www.w3.org/TR/html5/the-end.html#serializing-html-fragments
[2]
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
[3] http://msdn.microsoft.com/en-us/library/cc848922(VS.85).aspx


http://gwt-code-reviews.appspot.com/771801/diff/1/10
File user/src/com/google/gwt/safehtml/shared/EscapeUtils.java (right):

http://gwt-code-reviews.appspot.com/771801/diff/1/10#newcode21
user/src/com/google/gwt/safehtml/shared/EscapeUtils.java:21: public
final class EscapeUtils {
Why is this not called HtmlUtils? (wrt UriUtils)

http://gwt-code-reviews.appspot.com/771801/diff/1/10#newcode51
user/src/com/google/gwt/safehtml/shared/EscapeUtils.java:51: //
TODO(pdr): This needs to be profiled to see if it is useful or not.
See how Closure Library does this, and the note about performance in the
doc:
http://code.google.com/p/closure-library/source/browse/trunk/closure/goog/string/string.js#443

http://gwt-code-reviews.appspot.com/771801/diff/1/12
File
user/src/com/google/gwt/safehtml/shared/OnlyToBeUsedInGeneratedCodeStringBlessedAsSafeHtml.java
(right):

http://gwt-code-reviews.appspot.com/771801/diff/1/12#newcode26
user/src/com/google/gwt/safehtml/shared/OnlyToBeUsedInGeneratedCodeStringBlessedAsSafeHtml.java:26:
public class OnlyToBeUsedInGeneratedCodeStringBlessedAsSafeHtml
It doesn't look any different from SafeHtmlString, so how about using
EscapeUtils.fromSafeConstant instead?

http://gwt-code-reviews.appspot.com/771801/show

--
http://groups.google.com/group/Google-Web-Toolkit-Contributors

Reply via email to