Hi Ogata,

I finally managed to find some time to experiment with this. To measure invocation performance I created the following JMH benchmark [1]. It measures the invocation speed of instance and static methods using either:
- direct invocation (bytecodes)
- invocation via constant Method instance
- invocation via variable Method instance

Here are the results using unmodified JDK 14 build (baseline):

Benchmark                                         Mode  Cnt Score   Error  Units ReflectionSpeedBenchmark.instanceDirect           avgt   10   2.272 ± 0.002  ns/op ReflectionSpeedBenchmark.instanceReflectiveConst  avgt   10  16.609 ± 0.162  ns/op ReflectionSpeedBenchmark.instanceReflectiveVar    avgt   10  16.715 ± 0.163  ns/op ReflectionSpeedBenchmark.staticDirect             avgt   10   2.275 ± 0.012  ns/op ReflectionSpeedBenchmark.staticReflectiveConst    avgt   10  16.351 ± 0.330  ns/op ReflectionSpeedBenchmark.staticReflectiveVar      avgt   10  16.259 ± 0.196  ns/op

Your webrev.04 [2] has a slight (~ 6%) improvement for constant Method instance (i.e. assigned to static final field):

Benchmark                                         Mode  Cnt Score   Error  Units ReflectionSpeedBenchmark.instanceDirect           avgt   10   2.273 ± 0.003  ns/op ReflectionSpeedBenchmark.instanceReflectiveConst  avgt   10  15.628 ± 0.115  ns/op ReflectionSpeedBenchmark.instanceReflectiveVar    avgt   10  16.706 ± 0.144  ns/op ReflectionSpeedBenchmark.staticDirect             avgt   10   2.277 ± 0.008  ns/op ReflectionSpeedBenchmark.staticReflectiveConst    avgt   10  15.285 ± 0.109  ns/op ReflectionSpeedBenchmark.staticReflectiveVar      avgt   10  16.600 ± 0.222  ns/op

Now I have prepared another variant [3] that replaces DelegatingMethodAccessorImpl with SlowFastMethodAccessorImpl and produces the following result:

Benchmark                                         Mode  Cnt Score   Error  Units ReflectionSpeedBenchmark.instanceDirect           avgt   10   2.371 ± 0.027  ns/op ReflectionSpeedBenchmark.instanceReflectiveConst  avgt   10   7.161 ± 0.066  ns/op ReflectionSpeedBenchmark.instanceReflectiveVar    avgt   10  16.501 ± 0.154  ns/op ReflectionSpeedBenchmark.staticDirect             avgt   10   2.373 ± 0.017  ns/op ReflectionSpeedBenchmark.staticReflectiveConst    avgt   10   6.971 ± 0.103  ns/op ReflectionSpeedBenchmark.staticReflectiveVar      avgt   10  15.893 ± 0.110  ns/op

This is more than twice as fast as the baseline for constant Method instances while not degrading performance for variable Method instances.


[1] http://cr.openjdk.java.net/~plevart/jdk-dev/8229871_Method.methodAccessor/ReflectionSpeedBenchmark.java
[2] http://cr.openjdk.java.net/~ogatak/8229871/webrev.04/
[3] http://cr.openjdk.java.net/~plevart/jdk-dev/8229871_Method.methodAccessor/webrev.01/


Could you spin this one [3] on your SPECjbb2015 benchmark to see if it still performs favorably?


Regards, Peter

On 10/11/19 12:17 PM, Kazunori Ogata wrote:
Hi Peter,

Thank you for the comment and suggestion of the fix.

I tried to pick up your change w.r.t. methodAccessor:
https://cr.openjdk.java.net/~ogatak/8229871/webrev.04/


Regarding micro benchmark, my original motivation of this change is to
improve performance of Class.getMethods(), which calls Method.copy() for
each declared method to create a copy of Method[].

I measured my simple microbench:
https://cr.openjdk.java.net/~ogatak/8229871/GetMethodsBench.java

Base code: Elapsed time = 4808 ms
webrev.01: Elapsed time = 4536 ms  (+   6%)
webrev.02: Elapsed time = 2331 ms  (+106%)
webrev.04: Elapsed time = 3746 ms  (+ 28%)

I'll measure larger benchmark and try to think if we can reduce the
overhead of memory barrier.


Regards,
Ogata


Peter Levart <peter.lev...@gmail.com> wrote on 2019/10/09 16:44:13:

From: Peter Levart <peter.lev...@gmail.com>
To: Kazunori Ogata <oga...@jp.ibm.com>, core-libs-dev@openjdk.java.net
Date: 2019/10/09 16:44
Subject: [EXTERNAL] Re: RFR: JDK-8229871: Improve performance of
Method.copy() and leafCopy()

Hi Ogata,

May I just add that volatile field ensured that MethodAccessor object
was
correctly published. DelegatingMethodAccessortImpl is not safe to be
published via data race because it contains plain `delegate` field
initialized in the constructor. In addition, the object that is first
assigned to that field is NativeMethodAccessorImpl which has plain
`parent` field. You can get NPE when invoking the Method.invoke from
multuiple threads if Method.methodAccessor is not volatile.

In addition, It would be nice to have two microbenchmarks exercising:
a) Method copy performance
b) Method invocation performance

Regards, Peter

P.S. When exploring the possibility of an alternative MethodAccessor
implementation (using MethodHandle(s)):


http://cr.openjdk.java.net/~plevart/jdk-dev/6824466_MHReflectionAccessors/
webrev.00.2/

...I found out that it was possible to re-arrange the play between
DelegatingMethodAccessorImpl, NativeMethodAccessorImpl and generated
MethodAccessor in such a way that the DelegatingMethodAccessortImpl
becomes safe to be published via data race. This allowed for
Method.methodAccessor field to become plain field. In addition this
field
can be made @Stable which further optimizes access to MethodAccessor
instance when Method instance can be constant-folded, which showed in
special microbenchmarks.

So perhaps you could try to use above implementation (just changes to
DelegatingMethodAccessorImpl, NativeMethodAccessorImpl and part of
Reflection factory but without MH* stuff) and measure it against current
and your implementation (which as shown above has a data-race flaw).
On 10/8/19 12:23 PM, Kazunori Ogata wrote:
Hi all,

I posted two changes and got reply that performance evaluation is
needed.
I found that making Method.methodAccessor non-volatile (webrev.02) is
better than avoid copying methodAccessor value when it is null
(webrev.01), as shown below.

So I'd like to ask review of the former change.  I updated weberv using
the latest code base (though there was no difference from webrev.02):

Webrev: http://cr.openjdk.java.net/~ogatak/8229871/webrev.03/

Bug: https://bugs.openjdk.java.net/browse/JDK-8229871


For this performance evaluation, I calculated 75 percentile of 9 runs of
SPECjbb2015 and 60 percentile of 50 runs of DaCapo to omit outliers.  I
bound a JVM to a NUMA node and set the number of GC threads to the same
as
the number of physical cores.  These tuning reduced run-to-run
fluctuation
on POWER (as usual...).

SPECjbb2015:
   webrev.02:  critical jOPS +1.6%, max jOPS +0.2%
   webrev.01:  critical jOPS +0.4%, max jOPS +0.2%


For DaCapo, some benchmark still improved performance and some degraded,
but the geometric mean of all benchmarks were small:
   weberv.02: +0.3%
   weberv.01: +0.2%

The difference of improvement/degradation between the two changes in
each
benchmark were less than 0.8%.

The range of improvement/degradation in each benchmark were:
   webrev.02: between +2.4% and -1.0%
   webrev.01: between +1.6% and -1.8%


So I think webrev.02 (i.e., making methodAccessor non-volatile) is a
good
change, since it improved SPECjbb critical jOPS by 1.6%.


Regards,
Ogata


Kazunori Ogata/Japan/IBM wrote on 2019/08/27 15:41:39:

From: Kazunori Ogata/Japan/IBM
To: Mandy Chung <mandy.ch...@oracle.com>
Cc: core-libs-dev@openjdk.java.net
Date: 2019/08/27 15:41
Subject: Re: RFR: JDK-8229871: Improve performance of Method.copy() and
leafCopy()
Hi Mandy,

Let me post interim results of the performance evaluation, though I'm
still measuring benchmarks and analyzing them.

For SPECjbb2015, skipping storing null (webrev.01) was faster than
making
methodAccessor non-volatile (webrev.02).  The improvements of each of
the
patches in maxJOPS/criticalJOPS were 2.6%/3.9% and 1.8%/2.9%,
respectively.  This is only an average of six runs.

For DaCapo, the results were mixed.  In some benchmark, both of the
changes degraded performance.  In some others, webrev.01 was better, but

weberv.02 was better in some others.

I'll continue evaluation, but it is helpful if you could give me some
hints on why webrev.01 can be better than webrev.02 in SPECjbb2015.

Regards,
Ogata

Kazunori Ogata/Japan/IBM wrote on 2019/08/21 20:02:41:

From: Kazunori Ogata/Japan/IBM
To: Mandy Chung <mandy.ch...@oracle.com>
Cc: core-libs-dev@openjdk.java.net
Date: 2019/08/21 20:02
Subject: Re: RFR: JDK-8229871: Improve performance of Method.copy()
and leafCopy()
Hi Mandy,

Thank you for reviewing the webrev.  I updated it to add a space after
"if" and also put four spaces for indentation (it was three).

http://cr.openjdk.java.net/~ogatak/8229871/webrev.01/

Thank you so much for checking the history of fieldAccessor.  I was
surprised that fieldAccessor was made non-volatile in JDK5, but
methodAccessor was left as volatile for 15 years after that...

I agree we need benchmark data.  My simple micro benchmark that
repeats
invoking Class.getMethods() improved performance by 70% when it made
non-
volatile (as shown in the following webrev).  I'll try to run larger
benchmarks, such as SPECjbb2015, to see real impact.

http://cr.openjdk.java.net/~ogatak/8229871/webrev.02/

Regards,
Ogata

Mandy Chung <mandy.ch...@oracle.com> wrote on 2019/08/21 01:21:42:

From: Mandy Chung <mandy.ch...@oracle.com>
To: Kazunori Ogata <oga...@jp.ibm.com>
Cc: core-libs-dev@openjdk.java.net
Date: 2019/08/21 01:21
Subject: [EXTERNAL] Re: RFR: JDK-8229871: Imporve performance of
Method.copy() and leafCopy()

Hi Ogata,

The patch looks okay.  Nit: please add a space between if and (.

About volatile methodAccessor field, I checked the history.  Both
fieldAccessor and methodAccessor were started as volatile and the
fieldAccessor declaration was updated due to JDK-5044412.   As you
observe, I think the methodAccessor field could be made
non-volatile.
OTOH that might impact when it's inflated to spin bytecode for this
method invocation.  I don't know how importance to keep its volatile
vs
non-volatile in practice without doing benchmarking/real application
testing.

Mandy

On 8/19/19 2:51 AM, Kazunori Ogata wrote:
Hi,

May I have review for "JDK-8229871: Imporve performance of
Method.copy()
and leafCopy()"?

Method.copy() and leafCopy() creates a copy of a Method object
with
sharing MethodAccessor object. Since the methodAccessor field is a
volatile variable, copying this field needs memory fence to ensure
the
field is visible to all threads on the weak memory platforms such
as POWER
and ARM.

When the methodAccessor of the root object is null (i.e., not
initialized
yet), we do not need to copy the null value because this field of
the
copied object has been initialized to null in the constructor. We
can
reduce overhead of the memory fence only when the root's
methodAccessor is
non-null. This change improved performance by 5.8% using a micro
benchmark
that repeatedly invokes Class.getMethods().

Bug: https://bugs.openjdk.java.net/browse/JDK-8229871

Webrev: http://cr.openjdk.java.net/~ogatak/8229871/webrev.00/


By the way, why Method.methodAccessor is volatile, while
Field.fieldAccessor and Field.overrideFieldAccessor are not
volatile?  I
know the use of volatile reduces probability of creating
duplicated method
accessor, but the chance still exists.  I couldn't find the
difference
between Method and Field classes to make Method.methodAccessor
volatile.
If we can make it non-volatile, it is more preferable than a quick
hack
above.


Regards,
Ogata





Reply via email to