Hi Sherman,
On 2017-12-09 00:09, Xueming Shen wrote:
Hi,
Please help review the changes for j.u.z.ZipCoder/JDK-8184947 (which
also includes
cleanup/improvement work in java.lang.StringCoding.java to speed up
general String
coding performance, especially for UTF8).
issue: https://bugs.openjdk.java.net/browse/JDK-8184947
webrev: http://cr.openjdk.java.net/~sherman/8184947/webrev
I've not fully reviewed this yet, but something struck me halfway
through: As the ASCII
fast-path is what's really important here, we could write that part
without ever having
to go via a StringCoding.Result.
On four of your ZipCodingBM micros this improves performance a bit
further (~10%):
diff -r 848591d85052 src/java.base/share/classes/java/lang/StringCoding.java
--- a/src/java.base/share/classes/java/lang/StringCoding.java Sun Dec
10 18:48:21 2017 +0100
+++ b/src/java.base/share/classes/java/lang/StringCoding.java Sun Dec
10 18:55:38 2017 +0100
@@ -937,7 +937,13 @@
* Throws iae, instead of replacing, if malformed or unmmappble.
*/
static String newStringUTF8NoRepl(byte[] bytes, int off, int len) {
- Result ret = decodeUTF8(bytes, off, len, false);
+ if (COMPACT_STRINGS && !hasNegatives(bytes, off, len)) {
+ return new String(Arrays.copyOfRange(bytes, off, off +
len), LATIN1);
+ }
+ Result ret = decodeUTF8_0(bytes, off, len, false);
return new String(ret.value, ret.coder);
}
Benchmark Mode Cnt Score Error Units
ZipCodingBM.jf_entries avgt 25 43.682 ± 0.656 us/op
ZipCodingBM.jf_stream avgt 25 42.075 ± 0.444 us/op
ZipCodingBM.zf_entries avgt 25 43.323 ± 0.572 us/op
ZipCodingBM.zf_stream avgt 25 40.237 ± 0.604 us/op
After:
Benchmark Mode Cnt Score Error Units
ZipCodingBM.jf_entries avgt 25 37.551 ± 1.198 us/op
ZipCodingBM.jf_stream avgt 25 38.065 ± 0.628 us/op
ZipCodingBM.zf_entries avgt 25 37.595 ± 0.686 us/op
ZipCodingBM.zf_stream avgt 25 35.734 ± 0.442 us/op
(I don't know which jar you using as test.jar, but results seems
consistent across a
few different ones)
The gain is achieved by not going via the
ThreadLocal<StringCoding.Result> resultCache,
which checks out when inspecting the perfasm output.
I'm a bit skeptical of ThreadLocal caching optimizations for such small
objects
(StringCoding.Result), and wonder if there's something else we can do to
help the
optimizer out here, possibly eliminating the allocation entirely.
Thanks!
/Claes