[gwt-contrib] Re: Add MD5 implementation, String byte manipulation methods/constructors. (issue516801)

t . broyer Wed, 12 May 2010 19:11:11 -0700


http://gwt-code-reviews.appspot.com/516801/diff/1/4
File user/super/com/google/gwt/emul/java/lang/String.java (right):


http://gwt-code-reviews.appspot.com/516801/diff/1/4#newcode458
user/super/com/google/gwt/emul/java/lang/String.java:458: int n =
str.length();
There's actually a much easier and faster way, if you can somehow cast a
String to a byte[]:
   unescape(encodeURIComponent(bytes))

encodeURIComponent will %-escape each char using UTF-8, and unescape
will unescape each %hh sequence. This would turn, e.g. "\u00E9" into
"\u00C3\u00A9". You could then get the bytes just like getChars(), which
is still probably faster than all those if/else, bit-shifts et al. to do
UTF-8 encoding yourself.

http://gwt-code-reviews.appspot.com/516801/diff/1/4#newcode515
user/super/com/google/gwt/emul/java/lang/String.java:515: private static
String utf8ToString(byte[] bytes, int ofs, int len) {
Same as above re. getBytesUtf8:
   decodeURIComponent(escape(bytes))
would turn "\xC3\xA9" into "\u00E9". Building the "\xC3\xA9" string from
a byte[] should be as easy as String.fromCharCode.apply(null,
bytes.splice(ofs, len)), as in valueOf.
In case the String-constructed-from-bytes contains chars out of the
0-255 range (such as when a byte[] array contain non-byte values, which
could happen in web mode), then escape() will turn them into %uHHHH
escapes and decodeURIComponent then throws; this can be fixed upfront as
in getBytesLatin1 or afterwards with a .replace(/%u([0-9a-f]){4}/gi,
function(...
I don't have numbers but I think it'd be faster than the "manual UTF-8"
algorithm as done here.

http://gwt-code-reviews.appspot.com/516801/show

--
http://groups.google.com/group/Google-Web-Toolkit-Contributors

[gwt-contrib] Re: Add MD5 implementation, String byte manipulation methods/constructors. (issue516801)

Reply via email to