On Aug 8, 2011, at 8:21 PM, Christian Thalinger wrote:

> 
> On Aug 8, 2011, at 6:39 PM, Charles Oliver Nutter wrote:
> 
>> On Mon, Aug 8, 2011 at 9:51 AM, Christian Thalinger
>> <christian.thalin...@oracle.com> wrote:
>>> Since I have the basic push-notification of CallSites I'm now looking into 
>>> push-notification of SwitchPoints:
>>> 
>>> 7071709: JSR 292: switchpoint invalidation should be pushed not pulled
>>> 
>>> Basically it should be the same, just needs some additional love in the 
>>> compiler.
>>> 
>>> I looked into JRuby's usage of SwitchPoints and it seems it has something 
>>> to do with constants.  Is there an existing benchmark that would benefit 
>>> from the SwitchPoint optimization?  Seph also seems to use SwitchPoints, 
>>> PHP.reboot does not (that's what grep tells me).
>> 
>> Yes, currently SwitchPoint is only used for constant lookup, since
>> constant modification invalidates globally. A good benchmark to use
>> would be this one:
>> 
>> bench/language/bench_const_lookup.rb <number of iters>
>> 
>> Here's numbers with a recent openjdk-osx-build with and without
>> invokedynamic enabled
>> 
>> WITHOUT:
>> 
>> 100k * 100 nested const get               0.059000   0.000000
>> 0.059000 (  0.059000)
>> 100k * 100 nested const get               0.059000   0.000000
>> 0.059000 (  0.059000)
>> 100k * 100 nested const get               0.058000   0.000000
>> 0.058000 (  0.058000)
>> 100k * 100 nested const get               0.059000   0.000000
>> 0.059000 (  0.059000)
>> 100k * 100 nested const get               0.057000   0.000000
>> 0.057000 (  0.057000)
>> 100k * 100 inherited const get            0.058000   0.000000
>> 0.058000 (  0.058000)
>> 100k * 100 inherited const get            0.059000   0.000000
>> 0.059000 (  0.059000)
>> 100k * 100 inherited const get            0.058000   0.000000
>> 0.058000 (  0.058000)
>> 100k * 100 inherited const get            0.058000   0.000000
>> 0.058000 (  0.058000)
>> 100k * 100 inherited const get            0.063000   0.000000
>> 0.063000 (  0.064000)
>> 100k * 100 both                           0.060000   0.000000
>> 0.060000 (  0.060000)
>> 100k * 100 both                           0.060000   0.000000
>> 0.060000 (  0.060000)
>> 100k * 100 both                           0.059000   0.000000
>> 0.059000 (  0.059000)
>> 100k * 100 both                           0.058000   0.000000
>> 0.058000 (  0.058000)
>> 100k * 100 both                           0.059000   0.000000
>> 0.059000 (  0.059000)
>> 
>> WITH: (specify -Xinvokedynamic.constants=true to JRuby, or
>> -Djruby.invokedynamic.constants=true to JVM)
>> 
>> 100k * 100 nested const get               1.321000   0.000000
>> 1.321000 (  1.321000)
>> 100k * 100 nested const get               1.311000   0.000000
>> 1.311000 (  1.311000)
>> 100k * 100 nested const get               1.305000   0.000000
>> 1.305000 (  1.305000)
>> 100k * 100 nested const get               1.293000   0.000000
>> 1.293000 (  1.294000)
>> 100k * 100 nested const get               1.292000   0.000000
>> 1.292000 (  1.293000)
>> 100k * 100 inherited const get            1.295000   0.000000
>> 1.295000 (  1.295000)
>> 100k * 100 inherited const get            1.241000   0.000000
>> 1.241000 (  1.241000)
>> 100k * 100 inherited const get            1.241000   0.000000
>> 1.241000 (  1.241000)
>> 100k * 100 inherited const get            1.244000   0.000000
>> 1.244000 (  1.244000)
>> 100k * 100 inherited const get            1.236000   0.000000
>> 1.236000 (  1.236000)
>> 100k * 100 both                           1.280000   0.000000
>> 1.280000 (  1.280000)
>> 100k * 100 both                           1.236000   0.000000
>> 1.236000 (  1.236000)
>> 100k * 100 both                           1.229000   0.000000
>> 1.229000 (  1.230000)
>> 100k * 100 both                           1.236000   0.000000
>> 1.236000 (  1.236000)
>> 100k * 100 both                           1.248000   0.000000
>> 1.248000 (  1.248000)
>> 
>> You can see there's some room for improvement :) The number should be
>> faster with invokedynamic, since the SwitchPoint form has no active
>> guard.
> 
> That's perfect!  Let's see what numbers I can come up with...

Here are the numbers for JDK 7 b147, 7071307+7071653, and 
7071307+7071653+7071709:

7071307: MethodHandle bimorphic inlining should consider the frequency
7071653: JSR 292: call site change notification should be pushed not pulled 
7071709: JSR 292: switchpoint invalidation should be pushed not pulled 

JDK 7 b147:

$ jruby --server -Xinvokedynamic.constants=true 
bench/language/bench_const_lookup.rb 1
                                              user     system      total        
real
100k * 100 nested const get               1.301000   0.000000   1.301000 (  
1.176000)
100k * 100 nested const get               1.057000   0.000000   1.057000 (  
1.057000)
100k * 100 nested const get               1.052000   0.000000   1.052000 (  
1.052000)
100k * 100 nested const get               1.051000   0.000000   1.051000 (  
1.052000)
100k * 100 nested const get               1.052000   0.000000   1.052000 (  
1.052000)
100k * 100 inherited const get            1.188000   0.000000   1.188000 (  
1.188000)
100k * 100 inherited const get            1.126000   0.000000   1.126000 (  
1.126000)
100k * 100 inherited const get            1.125000   0.000000   1.125000 (  
1.125000)
100k * 100 inherited const get            1.126000   0.000000   1.126000 (  
1.126000)
100k * 100 inherited const get            1.130000   0.000000   1.130000 (  
1.130000)
100k * 100 both                           1.214000   0.000000   1.214000 (  
1.214000)
100k * 100 both                           1.134000   0.000000   1.134000 (  
1.134000)
100k * 100 both                           1.134000   0.000000   1.134000 (  
1.134000)
100k * 100 both                           1.135000   0.000000   1.135000 (  
1.135000)
100k * 100 both                           1.135000   0.000000   1.135000 (  
1.135000)

7071307+7071653:

$ jruby --server -Xinvokedynamic.constants=true 
bench/language/bench_const_lookup.rb 1
                                              user     system      total        
real
100k * 100 nested const get               0.552000   0.000000   0.552000 (  
0.522000)
100k * 100 nested const get               0.325000   0.000000   0.325000 (  
0.325000)
100k * 100 nested const get               0.345000   0.000000   0.345000 (  
0.345000)
100k * 100 nested const get               0.339000   0.000000   0.339000 (  
0.338000)
100k * 100 nested const get               0.343000   0.000000   0.343000 (  
0.343000)
100k * 100 inherited const get            0.477000   0.000000   0.477000 (  
0.477000)
100k * 100 inherited const get            0.307000   0.000000   0.307000 (  
0.308000)
100k * 100 inherited const get            0.309000   0.000000   0.309000 (  
0.309000)
100k * 100 inherited const get            0.309000   0.000000   0.309000 (  
0.309000)
100k * 100 inherited const get            0.307000   0.000000   0.307000 (  
0.307000)
100k * 100 both                           0.486000   0.000000   0.486000 (  
0.486000)
100k * 100 both                           0.346000   0.000000   0.346000 (  
0.346000)
100k * 100 both                           0.340000   0.000000   0.340000 (  
0.340000)
100k * 100 both                           0.347000   0.000000   0.347000 (  
0.347000)
100k * 100 both                           0.340000   0.000000   0.340000 (  
0.340000)

7071307+7071653+7071709:

$ jruby --server -Xinvokedynamic.constants=true 
bench/language/bench_const_lookup.rb 1
                                              user     system      total        
real
100k * 100 nested const get               0.468000   0.000000   0.468000 (  
0.438000)
100k * 100 nested const get               0.238000   0.000000   0.238000 (  
0.238000)
100k * 100 nested const get               0.251000   0.000000   0.251000 (  
0.251000)
100k * 100 nested const get               0.242000   0.000000   0.242000 (  
0.242000)
100k * 100 nested const get               0.254000   0.000000   0.254000 (  
0.254000)
100k * 100 inherited const get            0.403000   0.000000   0.403000 (  
0.403000)
100k * 100 inherited const get            0.260000   0.000000   0.260000 (  
0.260000)
100k * 100 inherited const get            0.255000   0.000000   0.255000 (  
0.255000)
100k * 100 inherited const get            0.252000   0.000000   0.252000 (  
0.252000)
100k * 100 inherited const get            0.254000   0.000000   0.254000 (  
0.254000)
100k * 100 both                           0.384000   0.000000   0.384000 (  
0.384000)
100k * 100 both                           0.227000   0.000000   0.227000 (  
0.227000)
100k * 100 both                           0.221000   0.000000   0.221000 (  
0.221000)
100k * 100 both                           0.233000   0.000000   0.233000 (  
0.233000)
100k * 100 both                           0.238000   0.000000   0.238000 (  
0.238000)

That's pretty nice but compared to non-indy it sucks:

JDK 7 b147:

$ jruby --server bench/language/bench_const_lookup.rb 1
                                              user     system      total        
real
100k * 100 nested const get               0.271000   0.000000   0.271000 (  
0.242000)
100k * 100 nested const get               0.065000   0.000000   0.065000 (  
0.065000)
100k * 100 nested const get               0.052000   0.000000   0.052000 (  
0.052000)
100k * 100 nested const get               0.052000   0.000000   0.052000 (  
0.052000)
100k * 100 nested const get               0.051000   0.000000   0.051000 (  
0.051000)
100k * 100 inherited const get            0.224000   0.000000   0.224000 (  
0.224000)
100k * 100 inherited const get            0.053000   0.000000   0.053000 (  
0.053000)
100k * 100 inherited const get            0.053000   0.000000   0.053000 (  
0.053000)
100k * 100 inherited const get            0.054000   0.000000   0.054000 (  
0.054000)
100k * 100 inherited const get            0.054000   0.000000   0.054000 (  
0.054000)
100k * 100 both                           0.230000   0.000000   0.230000 (  
0.230000)
100k * 100 both                           0.058000   0.000000   0.058000 (  
0.058000)
100k * 100 both                           0.059000   0.000000   0.059000 (  
0.059000)
100k * 100 both                           0.058000   0.000000   0.058000 (  
0.058000)
100k * 100 both                           0.059000   0.000000   0.059000 (  
0.059000)

Some assembly inspection showed that the performance difference between indy 
vs. non-indy is mostly the out-of-line calls that fall off the threshold cliff 
(10-15 call sites).  When we rewrite the benchmark to loop more often (10M 
times) but only do 50 constant lookups then it gets interesting:

JDK 7 b147:

$ jruby --server -Xinvokedynamic.constants=true bench_const_lookup.rb 1
                                              user     system      total        
real
10M * 50 nested const get                37.918000   0.000000  37.918000 ( 
37.844000)
10M * 50 nested const get                37.448000   0.000000  37.448000 ( 
37.448000)
10M * 50 nested const get                36.845000   0.000000  36.845000 ( 
36.845000)
10M * 50 nested const get                36.841000   0.000000  36.841000 ( 
36.841000)
10M * 50 nested const get                36.864000   0.000000  36.864000 ( 
36.864000)
10M * 50 inherited const get             37.907000   0.000000  37.907000 ( 
37.907000)
10M * 50 inherited const get             37.117000   0.000000  37.117000 ( 
37.117000)
10M * 50 inherited const get             37.399000   0.000000  37.399000 ( 
37.399000)
10M * 50 inherited const get             37.555000   0.000000  37.555000 ( 
37.555000)
10M * 50 inherited const get             37.640000   0.000000  37.640000 ( 
37.640000)
10M * 50 both                            37.946000   0.000000  37.946000 ( 
37.946000)
10M * 50 both                            37.928000   0.000000  37.928000 ( 
37.928000)
10M * 50 both                            38.140000   0.000000  38.140000 ( 
38.140000)
10M * 50 both                            38.186000   0.000000  38.186000 ( 
38.186000)
10M * 50 both                            37.956000   0.000000  37.956000 ( 
37.956000)

JDK 7 b147:

$ jruby --server bench_const_lookup.rb 1
                                              user     system      total        
real
10M * 50 nested const get                 2.790000   0.000000   2.790000 (  
2.756000)
10M * 50 nested const get                 2.576000   0.000000   2.576000 (  
2.576000)
10M * 50 nested const get                 2.499000   0.000000   2.499000 (  
2.499000)
10M * 50 nested const get                 2.501000   0.000000   2.501000 (  
2.501000)
10M * 50 nested const get                 2.497000   0.000000   2.497000 (  
2.497000)
10M * 50 inherited const get              2.556000   0.000000   2.556000 (  
2.556000)
10M * 50 inherited const get              2.419000   0.000000   2.419000 (  
2.419000)
10M * 50 inherited const get              2.419000   0.000000   2.419000 (  
2.419000)
10M * 50 inherited const get              2.414000   0.000000   2.414000 (  
2.414000)
10M * 50 inherited const get              2.418000   0.000000   2.418000 (  
2.418000)
10M * 50 both                             2.546000   0.000000   2.546000 (  
2.546000)
10M * 50 both                             2.419000   0.000000   2.419000 (  
2.419000)
10M * 50 both                             2.417000   0.000000   2.417000 (  
2.417000)
10M * 50 both                             2.414000   0.000000   2.414000 (  
2.415000)
10M * 50 both                             2.421000   0.000000   2.421000 (  
2.421000)

7071307+7071653+7071709:

$ jruby --server -Xinvokedynamic.constants=true bench_const_lookup.rb 1 
                                              user     system      total        
real
10M * 50 nested const get                 0.590000   0.000000   0.590000 (  
0.560000)
10M * 50 nested const get                 0.466000   0.000000   0.466000 (  
0.466000)
10M * 50 nested const get                 0.305000   0.000000   0.305000 (  
0.305000)
10M * 50 nested const get                 0.310000   0.000000   0.310000 (  
0.310000)
10M * 50 nested const get                 0.304000   0.000000   0.304000 (  
0.303000)
10M * 50 inherited const get              0.461000   0.000000   0.461000 (  
0.461000)
10M * 50 inherited const get              0.426000   0.000000   0.426000 (  
0.426000)
10M * 50 inherited const get              0.353000   0.000000   0.353000 (  
0.353000)
10M * 50 inherited const get              0.355000   0.000000   0.355000 (  
0.355000)
10M * 50 inherited const get              0.356000   0.000000   0.356000 (  
0.356000)
10M * 50 both                             0.459000   0.000000   0.459000 (  
0.458000)
10M * 50 both                             0.435000   0.000000   0.435000 (  
0.435000)
10M * 50 both                             0.363000   0.000000   0.363000 (  
0.363000)
10M * 50 both                             0.360000   0.000000   0.360000 (  
0.360000)
10M * 50 both                             0.364000   0.000000   0.364000 (  
0.364000)

Well that's really nice!  The compiler is able to optimize away all constant 
lookups because all guards in between are eliminated and it can prove that the 
constant is not used.  The method is basically empty except a little JRuby 
boilerplate.  Now we need a real benchmark ;-)

-- Christian

> 
> -- Christian
> 
>> 
>> - Charlie
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev@openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
> 
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

_______________________________________________
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

Reply via email to