I really like the interface/generator idea but not really the implementation (XML parsing –even though I could live with this–, generating XML –the HTML5 serialization algorithm [1] is fortunately easy enough and interoperable–)
The HtmlSanitizer is a good idea, but the implementation is very weak [2]. There's a very good HTML5 parser implemented in Java at validator.nu, which is the exact same code as the HTML5 parser in Firefox (C++ code generated from the Java code, AFAIK). It can even be used client-side with GWT but it'd probably be overweight for the common use-case of HTML sanitization, for which I actually don't think there's a clean and safe way apart from browsers' own safeHtml implementations [3]; the validator.nu parser could at least be used (apart from compile-time template parsing) for server-side sanitization; and client-side sanitization would benefit from a "toStaticHTML+DOM-scraping" implementation for IE8+ (pass the HTML to toStaticHTML, then give it to some innerHTML, possibly in an iframe for sandboxing, and then filter/sanitize again the parsed DOM if needed, before using innerHTML to retrieve the sanitized HTML) In brief: - SafeHtml/SafeHtmlBuilder and EscapeUtils are OK - SafeHtmlTemplates could be improved, but is acceptable as-is as a first version - HtmlSanitizer is probably not worth it in its current state My 2c. [1] http://www.w3.org/TR/html5/the-end.html#serializing-html-fragments [2] http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 [3] http://msdn.microsoft.com/en-us/library/cc848922(VS.85).aspx http://gwt-code-reviews.appspot.com/771801/diff/1/10 File user/src/com/google/gwt/safehtml/shared/EscapeUtils.java (right): http://gwt-code-reviews.appspot.com/771801/diff/1/10#newcode21 user/src/com/google/gwt/safehtml/shared/EscapeUtils.java:21: public final class EscapeUtils { Why is this not called HtmlUtils? (wrt UriUtils) http://gwt-code-reviews.appspot.com/771801/diff/1/10#newcode51 user/src/com/google/gwt/safehtml/shared/EscapeUtils.java:51: // TODO(pdr): This needs to be profiled to see if it is useful or not. See how Closure Library does this, and the note about performance in the doc: http://code.google.com/p/closure-library/source/browse/trunk/closure/goog/string/string.js#443 http://gwt-code-reviews.appspot.com/771801/diff/1/12 File user/src/com/google/gwt/safehtml/shared/OnlyToBeUsedInGeneratedCodeStringBlessedAsSafeHtml.java (right): http://gwt-code-reviews.appspot.com/771801/diff/1/12#newcode26 user/src/com/google/gwt/safehtml/shared/OnlyToBeUsedInGeneratedCodeStringBlessedAsSafeHtml.java:26: public class OnlyToBeUsedInGeneratedCodeStringBlessedAsSafeHtml It doesn't look any different from SafeHtmlString, so how about using EscapeUtils.fromSafeConstant instead? http://gwt-code-reviews.appspot.com/771801/show -- http://groups.google.com/group/Google-Web-Toolkit-Contributors
