Re: Review request for 6977034 Thread.getState() very slow

David Holmes Sun, 05 Dec 2010 15:00:09 -0800

Brian Goetz said the following on 12/06/10 06:22:

Its worth noting that the real performance offender here is that threadstates are enums (and therefore object references). If they wereintegers, then the entire thing could most likely be done in a way thatis guaranteed branch-free and dcache-miss-free.
Agree with Eamonn that the current sequence-of-tests is probably goodenough. Might be worth doing a little profiling to choose optimal orderof tests; I would guess that the right order is RUNNABLE, WAITING,TIMED_WAITING, BLOCKED, NEW, TERMINATED, but that's just an OOMH(out-of-my-hat) guess.

I had already had this discussion with Mandy, including the idea that alookup table would be faster - if there were a simple way to constructit. The current order of tests is based on our discussion. No doubtRunnable should be first, and new/terminated last. The rest of somewhatsubjective. As sync is used more than wait/notify then BLOCKED would benext most likely. For waiting vs timed-waiting it depends on howdefensively people code their waits :)

Anyway, this was all premised on slow performance that Doug observed aspart of ForkJoin mechanics. It would be good to hear back from him onhow this updated approach performs. If the FJ code has changed such thatthis is no longer an issue then I would suggest that Mandy's changes are"good enough" and we let her move on.


Cheers,
David

 There are probably not enough bits here to

justify a binary search (which trades off best-case againstaverage/worst-case time, but might turn up a combination of bits thathas better branch prediction characteristics.)
On 12/5/2010 2:00 PM, Eamonn McManus wrote:
Yeah, it was a bit blithe of me to write that the sequence of tests was
faster. In the table-lookup version, if you get rid of the initialtest forRUNNABLE, and if you use Integer.numberOfLeadingZeros, and if the JITcompilerintrinsifies that to a native processor instruction, and if the lookuptableis in L1 cache, then the table-lookup version will run in constanttime and be
better than the worst case of the sequence-of-tests version, and probably
better than the average case too. But, as you say, that last /if/ (thecachehit) will usually not be true, and in that case I would not besurprised if
the sequence of tests were faster even in its worst case.
Anyway the sequence-of-tests version is unquestionably simpler, and Iwouldventure that the best solution is probably to go with that, plus a newmethodin the API that explicitly tests whether a thread is runnable. That'strivialto implement now that Mandy has pulled the knowledge of state bitsinto the
Java code rather than being hidden in the bowels of the VM; and its
implementation will be faster than (Thread.getState() == RUNNABLE)regardless
of the implementation of the latter.

Éamonn


On 5/12/10 8:27 AM, Brian Goetz wrote:
As Eamonn writes it, it will never cache miss but may frequently branch
mispredict (possibly multiple times). If you do a shift + mask +index intoa small table, it will cache miss most the time but never branchmispredict.(In a real program it will cache miss frequently since thread statecalls
are infrequent and the lookup table will fall out of cache; in a
microbenchmark it will almost never cache miss as the lookup tablewill be
hot.)

On 12/4/2010 7:22 AM, Eamonn McManus wrote:
Hi Mandy,

This test:

         if ((threadStatus&  JVMTI_THREAD_STATE_RUNNABLE) == 1) {

is always false, since JVMTI_THREAD_STATE_RUNNABLE is 4. (NetBeans 7.0
helpfully flags this; I'm not sure if earlier versions do.)
But, once corrected, I think you could use this idea further towrite a much
simpler and faster method, on these lines:

     public static Thread.State toThreadState(int threadStatus) {
         if ((threadStatus&  JVMTI_THREAD_STATE_RUNNABLE)*!= 0*) {
             return RUNNABLE;
         } else if ((threadStatus&
JVMTI_THREAD_STATE_BLOCKED_ON_MONITOR_ENTER) != 0) {
             return BLOCKED;
         } else if ((threadStatus&
JVMTI_THREAD_STATE_WAITING_WITH_TIMEOUT) != 0) {
             return TIMED_WAITING;
         } else if ((threadStatus&
JVMTI_THREAD_STATE_WAITING_INDEFINITELY) != 0) {
             return WAITING;
} else if ((threadStatus& JVMTI_THREAD_STATE_TERMINATED)!= 0) {
             return TERMINATED;
         } else {
             return NEW;
         }
     }
You could tweak the order of the tests based on what might be therelative
frequency of the different states but it probably isn't worth it.

Regards,

Éamonn


On 3/12/10 11:52 PM, Mandy Chung wrote:
Fix for 6977034: Thread.getState() very slow

Webrev at:
http://cr.openjdk.java.net/~mchung/6977034/webrev.00/
This is an improvement to map a Thread's threadStatus field toThread.State.The VM updates the Thread.threadStatus field directly at statetransitionwith the value as defined in JVM TI [1]. Thejava.lang.Thread.getState()implementation can directly access the threadStatus value and do adirectlookup from an array of Thread.State. The threadStatus value is abit vectorand we would have to create an array of a minimum of 1061 (0x425)elementsto do direct mapping. I took the approach to use the first highestorder bitset to 1 in the masked threadStatus value as the index to theThread.State
element and only caches 32 elements (could be fewer). I wrote a
micro-benchmark measuring the Thread.getState of a thread indifferent statethat shows 1.7X to 6X speedup (see below). There is possibly someissue withmy micro-benchmark that I didn't observe the 14X speed up as Dougdid in his
experiment. However, I'd like to get this reviewed and pushed to the
repository so that anyone can do more experiment on the performance
measurement.

Thanks
Mandy
P.S. The discussion on this thread can be found at [2] [3].

[1]
http://download.java.net/jdk7/docs/platform/jvmti/jvmti.html#GetThreadState[2]http://mail.openjdk.java.net/pipermail/core-libs-dev/2010-July/004567.html
[3]
http://mail.openjdk.java.net/pipermail/core-libs-dev/2010-August/004721.html
    JDK 7 b120 (in ms)    With fix (in ms)    Speed up
main        46465            22772            2.04
NEW        50676        29921            1.69
RUNNABLE    42202        14690            2.87
BLOCKED        72773        12296            5.92
WAITING        48811        13041            3.74
TIMED_WAITING    45737        12849            3.56
TERMINATED    40314        16376            2.46

Re: Review request for 6977034 Thread.getState() very slow

Reply via email to