Hi Steven,
This looks like a promising patch. If all goes well I can sponsor this
patch into hotspot-rt.
I have problems building your test. Could you send a stand-alone test
that can be build with plain JDK (i.e., exclude things like
com.google.common. and org.openjdk.jmh)?
Also, I saw that you tested with strings that are the same length as
UUID, which are 36 chars long:
2544f85f-1845-42c4-aba1-7a431e6edb99
eff61458-32f8-45cd-a77d-fddb14951cab
d685f1a2-0b95-4021-b9b3-b0fb3c02a17f
We do need to handle strings with shorter length, and making calls to
memcmp() may cause degradation with lots of short strings. Can you add a
sub-test that tests for strings length <= 4?
I would also recommend running a few non-trivial apps (such as eclipse)
and dump the interned string table to see the distribution of lengths. I
know this is a hassle, but it would be a good last step to wrap up your
work.
Meanwhile, I will try to run our benchmarks (aka "refworkload") to check
for any performance impact.
Thanks
- Ioi
On 08/04/2013 10:21 PM, Steven Schlansker wrote:
On Tue, 11 Jun 2013 10:28:14 +0100
Alan Bateman <alan.bate...@oracle.com> wrote:
On 10/06/2013 19:06, Steven Schlansker wrote:
Hi core-libs-dev,
While doing performance profiling of my application, I discovered
that nearly 50% of the time deserializing JSON was spent within
String.intern(). I understand that in general interning Strings is
not the best approach for things, but I think I have a decent use
case -- the value of a certain field is one of a very limited
number of valid values (that are not known at compile time, so I
cannot use an Enum), and is repeated many millions of times in the
JSON stream.
Have you run with -XX:+PrintStringTableStatistics? Might be
interesting if you can share the output (it is printed just before
the VM terminates).
There are also tuning knobs such as StringTableSize and would be
interesting to know if you've experimented with.
-Alan.
Hi everyone,
Thanks again for your useful advice. I definitely misjudged the
difficulty and complexity of working with methods that directly bridge
the Java <-> C++ "gap"! As such, it took me way longer to get to this
than I expected...
That said, I think I have very good results to report. OpenJDK8 is
already *significantly* better than OpenJDK 7 was, but still can be
better.
Here is the patch I have at the moment:
https://gist.github.com/stevenschlansker/6153643
I used oprofile with oprofile-jit to identify the hot spots in the
benchmark code as being java_lang_String::equals() and
java_lang_String::as_unicode_string.
Both of these methods have hand-written loops that copy or compare
jchar arrays.
The problem is that in -fastdebug or -slowdebug releases, this is
one method call per character (the function is not inlined). Even in
-release builds, it seems that this is significantly slower than the
libc memcpy() or memcmp() functions which can use SSE4 (or other
related technologies).
My patch adds new methods, char_cmp and char_cpy, which delegate to the
mem* functions instead of using hand-written loops.
The micro-benchmark results are very good for such a small change.
On fastdebug, before:
Benchmark Mode Thr Cnt
Sec Mean Mean error Units
o.s.b.InternBenchmark.testLongStringChmIntern sample 1 2819
5 1.780 0.184 msec/op
o.s.b.InternBenchmark.testLongStringJdkIntern sample 1 343
5 14.571 0.310 msec/op
o.s.b.InternBenchmark.testLongStringNoIntern sample 1 8712
5 0.526 0.138 msec/op
o.s.b.InternBenchmark.testShortStringChmIntern sample 1 4427
5 1.133 0.121 msec/op
o.s.b.InternBenchmark.testShortStringJdkIntern sample 1 603
5 8.319 0.161 msec/op
o.s.b.InternBenchmark.testShortStringNoIntern sample 1 17185
5 0.274 0.048 msec/op
After:
Benchmark Mode Thr Cnt
Sec Mean Mean error Units
o.s.b.InternBenchmark.testLongStringChmIntern sample 1 2898
5 1.812 0.208 msec/op
o.s.b.InternBenchmark.testLongStringJdkIntern sample 1 1138
5 4.397 0.136 msec/op
o.s.b.InternBenchmark.testLongStringNoIntern sample 1 9035
5 0.519 0.146 msec/op
o.s.b.InternBenchmark.testShortStringChmIntern sample 1 4538
5 1.094 0.107 msec/op
o.s.b.InternBenchmark.testShortStringJdkIntern sample 1 1363
5 3.686 0.100 msec/op
o.s.b.InternBenchmark.testShortStringNoIntern sample 1 16686
5 0.316 0.081 msec/op
On release, before:
Benchmark Mode Thr Cnt
Sec Mean Mean error Units
o.s.b.InternBenchmark.testLongStringChmIntern sample 1 4030
5 1.240 0.002 msec/op
o.s.b.InternBenchmark.testLongStringJdkIntern sample 1 1024
5 4.894 0.042 msec/op
o.s.b.InternBenchmark.testLongStringNoIntern sample 1 20000
5 0.185 0.002 msec/op
o.s.b.InternBenchmark.testShortStringChmIntern sample 1 6143
5 0.814 0.005 msec/op
o.s.b.InternBenchmark.testShortStringJdkIntern sample 1 1852
5 2.702 0.016 msec/op
o.s.b.InternBenchmark.testShortStringNoIntern sample 1 20000
5 0.102 0.001 msec/op
After:
Benchmark Mode Thr Cnt
Sec Mean Mean error Units
o.s.b.InternBenchmark.testLongStringChmIntern sample 1 4040
5 1.236 0.002 msec/op
o.s.b.InternBenchmark.testLongStringJdkIntern sample 1 2733
5 1.832 0.010 msec/op
o.s.b.InternBenchmark.testLongStringNoIntern sample 1 20000
5 0.181 0.002 msec/op
o.s.b.InternBenchmark.testShortStringChmIntern sample 1 6170
5 0.809 0.001 msec/op
o.s.b.InternBenchmark.testShortStringJdkIntern sample 1 3577
5 1.396 0.007 msec/op
o.s.b.InternBenchmark.testShortStringNoIntern sample 1 20000
5 0.102 0.000 msec/op
This is almost a 3.5x improvement on fastdebug builds, and more than a
2.5x improvement on release builds. It is now only ~50% slower than
ConcurrentHashMap, at least for the low-contention case! (I did not
benchmark higher numbers of threads thoroughly, but I do not think that
my changes could make that worse...)
Finally, the benchmark code:
https://github.com/stevenschlansker/jvm-intern-benchmark/blob/master/src/main/java/org/sugis/benchmark/InternBenchmark.java
It is not the best ever benchmark, but I'm hopeful that it's "good
enough" to demonstrate the wins I have. Please let me know if you
believe the benchmark invalidates my conclusions. It is run with JMH,
as that seems to be the standard way of doing things around here.
Thank you again for your time and input; I am hopeful that I have not
erred terribly :-)
Best,
Steven