LGTM. If we work much more on this area, though, here's an alternative approach. Instead of working around Unicode obscurities, we could encode strings using a Java-like encoding. For example, 0 would become \u0000, and \ would become \\.
Going that way, reading and writing these XML files would be lossless. Additionally, we should have fewer questions about whether we have nailed the last obscure Unicode problem or if more are lurking. http://gwt-code-reviews.appspot.com/126813/diff/1/2 File dev/core/src/com/google/gwt/core/ext/soyc/impl/SizeMapRecorder.java (right): http://gwt-code-reviews.appspot.com/126813/diff/1/2#newcode153 Line 153: continue; These continues are implied, aren't they? They could be left out to make the code shorter. http://gwt-code-reviews.appspot.com/126813 -- http://groups.google.com/group/Google-Web-Toolkit-Contributors
