As Eamonn writes it, it will never cache miss but may frequently branch
mispredict (possibly multiple times). If you do a shift + mask + index into a
small table, it will cache miss most the time but never branch mispredict.
(In a real program it will cache miss frequently since thread state calls are
infrequent and the lookup table will fall out of cache; in a microbenchmark it
will almost never cache miss as the lookup table will be hot.)
On 12/4/2010 7:22 AM, Eamonn McManus wrote:
Hi Mandy,
This test:
if ((threadStatus& JVMTI_THREAD_STATE_RUNNABLE) == 1) {
is always false, since JVMTI_THREAD_STATE_RUNNABLE is 4. (NetBeans 7.0
helpfully flags this; I'm not sure if earlier versions do.)
But, once corrected, I think you could use this idea further to write a much
simpler and faster method, on these lines:
public static Thread.State toThreadState(int threadStatus) {
if ((threadStatus& JVMTI_THREAD_STATE_RUNNABLE)*!= 0*) {
return RUNNABLE;
} else if ((threadStatus&
JVMTI_THREAD_STATE_BLOCKED_ON_MONITOR_ENTER) != 0) {
return BLOCKED;
} else if ((threadStatus& JVMTI_THREAD_STATE_WAITING_WITH_TIMEOUT) !=
0) {
return TIMED_WAITING;
} else if ((threadStatus& JVMTI_THREAD_STATE_WAITING_INDEFINITELY) !=
0) {
return WAITING;
} else if ((threadStatus& JVMTI_THREAD_STATE_TERMINATED) != 0) {
return TERMINATED;
} else {
return NEW;
}
}
You could tweak the order of the tests based on what might be the relative
frequency of the different states but it probably isn't worth it.
Regards,
Éamonn
On 3/12/10 11:52 PM, Mandy Chung wrote:
Fix for 6977034: Thread.getState() very slow
Webrev at:
http://cr.openjdk.java.net/~mchung/6977034/webrev.00/
This is an improvement to map a Thread's threadStatus field to Thread.State.
The VM updates the Thread.threadStatus field directly at state transition
with the value as defined in JVM TI [1]. The java.lang.Thread.getState()
implementation can directly access the threadStatus value and do a direct
lookup from an array of Thread.State. The threadStatus value is a bit vector
and we would have to create an array of a minimum of 1061 (0x425) elements
to do direct mapping. I took the approach to use the first highest order bit
set to 1 in the masked threadStatus value as the index to the Thread.State
element and only caches 32 elements (could be fewer). I wrote a
micro-benchmark measuring the Thread.getState of a thread in different state
that shows 1.7X to 6X speedup (see below). There is possibly some issue with
my micro-benchmark that I didn't observe the 14X speed up as Doug did in his
experiment. However, I'd like to get this reviewed and pushed to the
repository so that anyone can do more experiment on the performance measurement.
Thanks
Mandy
P.S. The discussion on this thread can be found at [2] [3].
[1] http://download.java.net/jdk7/docs/platform/jvmti/jvmti.html#GetThreadState
[2] http://mail.openjdk.java.net/pipermail/core-libs-dev/2010-July/004567.html
[3] http://mail.openjdk.java.net/pipermail/core-libs-dev/2010-August/004721.html
JDK 7 b120 (in ms) With fix (in ms) Speed up
main 46465 22772 2.04
NEW 50676 29921 1.69
RUNNABLE 42202 14690 2.87
BLOCKED 72773 12296 5.92
WAITING 48811 13041 3.74
TIMED_WAITING 45737 12849 3.56
TERMINATED 40314 16376 2.46