Hi, I re-evaluated the performance of Peter's improvement [1] using the same base version. Sorry for taking long time, but it took time to verify if results are reliable.
The Peter's version reduced elapsed time of a micro bench [2] that repeatedly calls Class.getMethods() by 30.3%. However, this change did not affect so much on the scores of macro benchmarks, like SPECjbb2015 and DaCapo. For SPECjbb2015, critical jOPS improved by 0.4%, but max jOPS degraded by 0.1%. (75 percentile of 20 runs) The magnitude of the improvement/degradation was very small, in any way. For DaCapo, performance differences range from +2.9% (tradebeans) to -2.6% (lusearch) in 60 percentile of 50 runs. Since DaCapo scores tends to fluctuate from run to run, I think this result is in a range of run-to-run variation. May I ask to push this change to the repository? I guess these programs don't use extensively use reflective method invocation, but it does improve performance of Method.copy(). Or should I need to find a macro benchmark that extensively use reflective method invocation? [1] http://cr.openjdk.java.net/~plevart/jdk-dev/ 8229871_Method.methodAccessor/webrev.01/ [2] http://cr.openjdk.java.net/~ogatak/8229871/GetMethodsBench.java Regards, Ogata Kazunori Ogata/Japan/IBM wrote on 2019/10/31 01:46:45: > From: Kazunori Ogata/Japan/IBM > To: Peter Levart <peter.lev...@gmail.com> > Cc: core-libs-dev@openjdk.java.net > Date: 2019/10/31 01:46 > Subject: Re: RFR: JDK-8229871: Improve performance of Method.copy() and leafCopy() > > Hi Peter, > > Thank you very much for your updated fix and sorry to be late to reply. > > I found that the performance data I posted earlier was wrong because I > fetched the latest code before building the JVM with your fix, while I > still used older JVM as the base version. The new build picked up > JDK-8230020 [1], which reverts JDK-8225670 [2] that degraded performance > of SPECjbb2015. Unfortunately, the base version only included [2]... > > Your new version [3] apparently looks better. I'll update my base JVM and > measure the performance of [3]. > > Regards, > Ogata > > [1] https://bugs.openjdk.java.net/browse/JDK-8230020 > [2] https://bugs.openjdk.java.net/browse/JDK-8225670 > [3] http://cr.openjdk.java.net/~plevart/jdk-dev/ > 8229871_Method.methodAccessor/webrev.01/ > > Peter Levart <peter.lev...@gmail.com> wrote on 2019/10/24 06:09:54: > > > From: Peter Levart <peter.lev...@gmail.com> > > To: Kazunori Ogata <oga...@jp.ibm.com> > > Cc: core-libs-dev@openjdk.java.net > > Date: 2019/10/24 06:10 > > Subject: [EXTERNAL] Re: RFR: JDK-8229871: Improve performance of > > Method.copy() and leafCopy() > > > > Hi Ogata, > > > > I finally managed to find some time to experiment with this. To measure > > invocation performance I created the following JMH benchmark [1]. It > > measures the invocation speed of instance and static methods using either: > > - direct invocation (bytecodes) > > - invocation via constant Method instance > > - invocation via variable Method instance > > > > Here are the results using unmodified JDK 14 build (baseline): > > > > Benchmark Mode Cnt Score > > Error Units > > ReflectionSpeedBenchmark.instanceDirect avgt 10 2.272 ± > > 0.002 ns/op > > ReflectionSpeedBenchmark.instanceReflectiveConst avgt 10 16.609 ± > > 0.162 ns/op > > ReflectionSpeedBenchmark.instanceReflectiveVar avgt 10 16.715 ± > > 0.163 ns/op > > ReflectionSpeedBenchmark.staticDirect avgt 10 2.275 ± > > 0.012 ns/op > > ReflectionSpeedBenchmark.staticReflectiveConst avgt 10 16.351 ± > > 0.330 ns/op > > ReflectionSpeedBenchmark.staticReflectiveVar avgt 10 16.259 ± > > 0.196 ns/op > > > > Your webrev.04 [2] has a slight (~ 6%) improvement for constant Method > > instance (i.e. assigned to static final field): > > > > Benchmark Mode Cnt Score > > Error Units > > ReflectionSpeedBenchmark.instanceDirect avgt 10 2.273 ± > > 0.003 ns/op > > ReflectionSpeedBenchmark.instanceReflectiveConst avgt 10 15.628 ± > > 0.115 ns/op > > ReflectionSpeedBenchmark.instanceReflectiveVar avgt 10 16.706 ± > > 0.144 ns/op > > ReflectionSpeedBenchmark.staticDirect avgt 10 2.277 ± > > 0.008 ns/op > > ReflectionSpeedBenchmark.staticReflectiveConst avgt 10 15.285 ± > > 0.109 ns/op > > ReflectionSpeedBenchmark.staticReflectiveVar avgt 10 16.600 ± > > 0.222 ns/op > > > > Now I have prepared another variant [3] that replaces > > DelegatingMethodAccessorImpl with SlowFastMethodAccessorImpl and > > produces the following result: > > > > Benchmark Mode Cnt Score > > Error Units > > ReflectionSpeedBenchmark.instanceDirect avgt 10 2.371 ± > > 0.027 ns/op > > ReflectionSpeedBenchmark.instanceReflectiveConst avgt 10 7.161 ± > > 0.066 ns/op > > ReflectionSpeedBenchmark.instanceReflectiveVar avgt 10 16.501 ± > > 0.154 ns/op > > ReflectionSpeedBenchmark.staticDirect avgt 10 2.373 ± > > 0.017 ns/op > > ReflectionSpeedBenchmark.staticReflectiveConst avgt 10 6.971 ± > > 0.103 ns/op > > ReflectionSpeedBenchmark.staticReflectiveVar avgt 10 15.893 ± > > 0.110 ns/op > > > > This is more than twice as fast as the baseline for constant Method > > instances while not degrading performance for variable Method instances. > > > > > > [1] http://cr.openjdk.java.net/~plevart/jdk-dev/ > 8229871_Method.methodAccessor/ReflectionSpeedBenchmark.java > > [2] http://cr.openjdk.java.net/~ogatak/8229871/webrev.04/ > > [3] http://cr.openjdk.java.net/~plevart/jdk-dev/ > 8229871_Method.methodAccessor/webrev.01/ > > > > > > Could you spin this one [3] on your SPECjbb2015 benchmark to see if it > > still performs favorably? > > > > > > Regards, Peter > > > > On 10/11/19 12:17 PM, Kazunori Ogata wrote: > > > Hi Peter, > > > > > > Thank you for the comment and suggestion of the fix. > > > > > > I tried to pick up your change w.r.t. methodAccessor: > > > https://urldefense.proofpoint.com/v2/url? > > u=https-3A__cr.openjdk.java.net_-7Eogatak_8229871_webrev. > > 04_&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p- > > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb- > > K9uCUrU_Yw&s=-xRlUE3M_VEQ_pLDsVNMsIneJ7tKig8ElUy8vmAQoUM&e= > > > > > > > > > Regarding micro benchmark, my original motivation of this change is to > > > improve performance of Class.getMethods(), which calls Method.copy() for > > > each declared method to create a copy of Method[]. > > > > > > I measured my simple microbench: > > > https://urldefense.proofpoint.com/v2/url? > > > u=https-3A__cr.openjdk.java.net_-7Eogatak_8229871_GetMethodsBench.java&d=DwIDaQ&c=jf_iaSHvJObTbx- > > siA1ZOg&r=p- > > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb- > > K9uCUrU_Yw&s=Phaibyh6EWjUKos14T7aQfBzSGcH4stxqnhQFkEZsp4&e= > > > > > > Base code: Elapsed time = 4808 ms > > > webrev.01: Elapsed time = 4536 ms (+ 6%) > > > webrev.02: Elapsed time = 2331 ms (+106%) > > > webrev.04: Elapsed time = 3746 ms (+ 28%) > > > > > > I'll measure larger benchmark and try to think if we can reduce the > > > overhead of memory barrier. > > > > > > > > > Regards, > > > Ogata > > > > > > > > > Peter Levart <peter.lev...@gmail.com> wrote on 2019/10/09 16:44:13: > > > > > >> From: Peter Levart <peter.lev...@gmail.com> > > >> To: Kazunori Ogata <oga...@jp.ibm.com>, core-libs-dev@openjdk.java.net > > >> Date: 2019/10/09 16:44 > > >> Subject: [EXTERNAL] Re: RFR: JDK-8229871: Improve performance of > > >> Method.copy() and leafCopy() > > >> > > >> Hi Ogata, > > >> > > >> May I just add that volatile field ensured that MethodAccessor object > > > was > > >> correctly published. DelegatingMethodAccessortImpl is not safe to be > > >> published via data race because it contains plain `delegate` field > > >> initialized in the constructor. In addition, the object that is first > > >> assigned to that field is NativeMethodAccessorImpl which has plain > > >> `parent` field. You can get NPE when invoking the Method.invoke from > > >> multuiple threads if Method.methodAccessor is not volatile. > > >> > > >> In addition, It would be nice to have two microbenchmarks exercising: > > >> a) Method copy performance > > >> b) Method invocation performance > > >> > > >> Regards, Peter > > >> > > >> P.S. When exploring the possibility of an alternative MethodAccessor > > >> implementation (using MethodHandle(s)): > > >> > > >> > > > https://urldefense.proofpoint.com/v2/url? > > > u=http-3A__cr.openjdk.java.net_-7Eplevart_jdk-2Ddev_6824466-5FMHReflectionAccessors_&d=DwIDaQ&c=jf_iaSHvJObTbx- > > siA1ZOg&r=p- > > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb- > > K9uCUrU_Yw&s=LKR_2z3fvXB0IYUryijzgd-jH6wG3Mr2UmiOMKviFGU&e= > > >> webrev.00.2/ > > >> > > >> ...I found out that it was possible to re-arrange the play between > > >> DelegatingMethodAccessorImpl, NativeMethodAccessorImpl and generated > > >> MethodAccessor in such a way that the DelegatingMethodAccessortImpl > > >> becomes safe to be published via data race. This allowed for > > >> Method.methodAccessor field to become plain field. In addition this > > > field > > >> can be made @Stable which further optimizes access to MethodAccessor > > >> instance when Method instance can be constant-folded, which showed in > > >> special microbenchmarks. > > >> > > >> So perhaps you could try to use above implementation (just changes to > > >> DelegatingMethodAccessorImpl, NativeMethodAccessorImpl and part of > > >> Reflection factory but without MH* stuff) and measure it against current > > >> and your implementation (which as shown above has a data-race flaw). > > >> On 10/8/19 12:23 PM, Kazunori Ogata wrote: > > >> Hi all, > > >> > > >> I posted two changes and got reply that performance evaluation is > > > needed. > > >> I found that making Method.methodAccessor non-volatile (webrev.02) is > > >> better than avoid copying methodAccessor value when it is null > > >> (webrev.01), as shown below. > > >> > > >> So I'd like to ask review of the former change. I updated weberv using > > >> the latest code base (though there was no difference from webrev.02): > > >> > > >> Webrev: https://urldefense.proofpoint.com/v2/url? > > u=http-3A__cr.openjdk.java.net_-7Eogatak_8229871_webrev. > > 03_&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p- > > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb- > > K9uCUrU_Yw&s=xm_iw74CmqAabV2cctZfI75t28_DCXP9VFVjHcnQXp4&e= > > >> > > >> Bug: https://urldefense.proofpoint.com/v2/url? > > u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8229871&d=DwIDaQ&c=jf_iaSHvJObTbx- > > siA1ZOg&r=p- > > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb- > > K9uCUrU_Yw&s=bzRkFq845mYFriH7TirkzA4JzG0m47x09kebpHfMgTw&e= > > >> > > >> > > >> For this performance evaluation, I calculated 75 percentile of 9 runs of > > >> SPECjbb2015 and 60 percentile of 50 runs of DaCapo to omit outliers. I > > >> bound a JVM to a NUMA node and set the number of GC threads to the same > > > as > > >> the number of physical cores. These tuning reduced run-to-run > > > fluctuation > > >> on POWER (as usual...). > > >> > > >> SPECjbb2015: > > >> webrev.02: critical jOPS +1.6%, max jOPS +0.2% > > >> webrev.01: critical jOPS +0.4%, max jOPS +0.2% > > >> > > >> > > >> For DaCapo, some benchmark still improved performance and some degraded, > > >> but the geometric mean of all benchmarks were small: > > >> weberv.02: +0.3% > > >> weberv.01: +0.2% > > >> > > >> The difference of improvement/degradation between the two changes in > > > each > > >> benchmark were less than 0.8%. > > >> > > >> The range of improvement/degradation in each benchmark were: > > >> webrev.02: between +2.4% and -1.0% > > >> webrev.01: between +1.6% and -1.8% > > >> > > >> > > >> So I think webrev.02 (i.e., making methodAccessor non-volatile) is a > > > good > > >> change, since it improved SPECjbb critical jOPS by 1.6%. > > >> > > >> > > >> Regards, > > >> Ogata > > >> > > >> > > >> Kazunori Ogata/Japan/IBM wrote on 2019/08/27 15:41:39: > > >> > > >> From: Kazunori Ogata/Japan/IBM > > >> To: Mandy Chung <mandy.ch...@oracle.com> > > >> Cc: core-libs-dev@openjdk.java.net > > >> Date: 2019/08/27 15:41 > > >> Subject: Re: RFR: JDK-8229871: Improve performance of Method.copy() and > > >> leafCopy() > > >> Hi Mandy, > > >> > > >> Let me post interim results of the performance evaluation, though I'm > > >> still measuring benchmarks and analyzing them. > > >> > > >> For SPECjbb2015, skipping storing null (webrev.01) was faster than > > >> making > > >> methodAccessor non-volatile (webrev.02). The improvements of each of > > >> the > > >> patches in maxJOPS/criticalJOPS were 2.6%/3.9% and 1.8%/2.9%, > > >> respectively. This is only an average of six runs. > > >> > > >> For DaCapo, the results were mixed. In some benchmark, both of the > > >> changes degraded performance. In some others, webrev.01 was better, but > > > > > >> weberv.02 was better in some others. > > >> > > >> I'll continue evaluation, but it is helpful if you could give me some > > >> hints on why webrev.01 can be better than webrev.02 in SPECjbb2015. > > >> > > >> Regards, > > >> Ogata > > >> > > >> Kazunori Ogata/Japan/IBM wrote on 2019/08/21 20:02:41: > > >> > > >> From: Kazunori Ogata/Japan/IBM > > >> To: Mandy Chung <mandy.ch...@oracle.com> > > >> Cc: core-libs-dev@openjdk.java.net > > >> Date: 2019/08/21 20:02 > > >> Subject: Re: RFR: JDK-8229871: Improve performance of Method.copy() > > >> and leafCopy() > > >> Hi Mandy, > > >> > > >> Thank you for reviewing the webrev. I updated it to add a space after > > >> "if" and also put four spaces for indentation (it was three). > > >> > > >> https://urldefense.proofpoint.com/v2/url? > > u=http-3A__cr.openjdk.java.net_-7Eogatak_8229871_webrev. > > 01_&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p- > > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb- > > K9uCUrU_Yw&s=qLT9k5xsheWZfU7ocimSbEMANQDnelEUqqiR5X-Zio4&e= > > >> > > >> Thank you so much for checking the history of fieldAccessor. I was > > >> surprised that fieldAccessor was made non-volatile in JDK5, but > > >> methodAccessor was left as volatile for 15 years after that... > > >> > > >> I agree we need benchmark data. My simple micro benchmark that > > >> repeats > > >> invoking Class.getMethods() improved performance by 70% when it made > > >> non- > > >> volatile (as shown in the following webrev). I'll try to run larger > > >> benchmarks, such as SPECjbb2015, to see real impact. > > >> > > >> https://urldefense.proofpoint.com/v2/url? > > u=http-3A__cr.openjdk.java.net_-7Eogatak_8229871_webrev. > > 02_&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p- > > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb- > > K9uCUrU_Yw&s=aq50ONJW0fK7CBk1upVkJekAbRrDsZygPkWrjL_sM4I&e= > > >> > > >> Regards, > > >> Ogata > > >> > > >> Mandy Chung <mandy.ch...@oracle.com> wrote on 2019/08/21 01:21:42: > > >> > > >> From: Mandy Chung <mandy.ch...@oracle.com> > > >> To: Kazunori Ogata <oga...@jp.ibm.com> > > >> Cc: core-libs-dev@openjdk.java.net > > >> Date: 2019/08/21 01:21 > > >> Subject: [EXTERNAL] Re: RFR: JDK-8229871: Imporve performance of > > >> Method.copy() and leafCopy() > > >> > > >> Hi Ogata, > > >> > > >> The patch looks okay. Nit: please add a space between if and (. > > >> > > >> About volatile methodAccessor field, I checked the history. Both > > >> fieldAccessor and methodAccessor were started as volatile and the > > >> fieldAccessor declaration was updated due to JDK-5044412. As you > > >> observe, I think the methodAccessor field could be made > > >> non-volatile. > > >> OTOH that might impact when it's inflated to spin bytecode for this > > >> method invocation. I don't know how importance to keep its volatile > > >> vs > > >> non-volatile in practice without doing benchmarking/real application > > >> testing. > > >> > > >> Mandy > > >> > > >> On 8/19/19 2:51 AM, Kazunori Ogata wrote: > > >> Hi, > > >> > > >> May I have review for "JDK-8229871: Imporve performance of > > >> Method.copy() > > >> and leafCopy()"? > > >> > > >> Method.copy() and leafCopy() creates a copy of a Method object > > >> with > > >> sharing MethodAccessor object. Since the methodAccessor field is a > > >> volatile variable, copying this field needs memory fence to ensure > > >> the > > >> field is visible to all threads on the weak memory platforms such > > >> as POWER > > >> and ARM. > > >> > > >> When the methodAccessor of the root object is null (i.e., not > > >> initialized > > >> yet), we do not need to copy the null value because this field of > > >> the > > >> copied object has been initialized to null in the constructor. We > > >> can > > >> reduce overhead of the memory fence only when the root's > > >> methodAccessor is > > >> non-null. This change improved performance by 5.8% using a micro > > >> benchmark > > >> that repeatedly invokes Class.getMethods(). > > >> > > >> Bug: https://urldefense.proofpoint.com/v2/url? > > u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8229871&d=DwIDaQ&c=jf_iaSHvJObTbx- > > siA1ZOg&r=p- > > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb- > > K9uCUrU_Yw&s=bzRkFq845mYFriH7TirkzA4JzG0m47x09kebpHfMgTw&e= > > >> > > >> Webrev: https://urldefense.proofpoint.com/v2/url? > > u=http-3A__cr.openjdk.java.net_-7Eogatak_8229871_webrev. > > 00_&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p- > > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb- > > K9uCUrU_Yw&s=lGQy-Xy0ofp8d551jCUZdwmZ_OD4sXsMaoRKWzwer4o&e= > > >> > > >> > > >> By the way, why Method.methodAccessor is volatile, while > > >> Field.fieldAccessor and Field.overrideFieldAccessor are not > > >> volatile? I > > >> know the use of volatile reduces probability of creating > > >> duplicated method > > >> accessor, but the chance still exists. I couldn't find the > > >> difference > > >> between Method and Field classes to make Method.methodAccessor > > >> volatile. > > >> If we can make it non-volatile, it is more preferable than a quick > > >> hack > > >> above. > > >> > > >> > > >> Regards, > > >> Ogata > > >> > > >> > > >> > > > > > > >