On Wed, 17 Dec 2025 09:32:43 GMT, Andrew Haley <[email protected]> wrote:
>> Extend MOVK-based scheme to MOVK/MOVZ allowing to store 19 bits of metadata. >> >> Choose number of metadata slots in post-call NOP sequence between 1 and 2 >> depending on the offset from the CodeBlob header. >> >> Additionally, implement ADR/ADRP-based metadata storage - that provides 22 >> bits instead of 16 bits to store metadata. This can be enabled via >> UsePostCallSequenceWithADRP option. >> >> >> Renaissance 0.15.0 benchmark results (MOVK-based scheme) >> Neoverse V1. >> The runs were limited to 16 cores. >> >> Number of runs: >> 6 for baseline, 6 for the changes - interleaved pairs. >> >> Command line: >> java -jar renaissance-jmh-0.15.0.jar \ >> -bm avgt -gc true -v extra \ >> -jvmArgsAppend '-Xbatch -XX:-UseDynamicNumberOfCompilerThreads \ >> -XX:-CICompilerCountPerCPU -XX:ActiveProcessorCount=16 \ >> -XX:CICompilerCount=2 -Xms8g -Xmx8g -XX:+AlwaysPreTouch \ >> -XX:+UseG1GC' >> >> The change is geometric mean of ratios across 6 the pairs of runs. >> >> | Benchmark | Change | 90% >> CI for the change | >> | ----------------------------------------------------- | -------- | >> --------------------- | >> | org.renaissance.actors.JmhAkkaUct.run | -0.215% | >> -2.652% to 1.357% | >> | org.renaissance.actors.JmhReactors.run | -0.166% | >> -1.974% to 1.775% | >> | org.renaissance.jdk.concurrent.JmhFjKmeans.run | 0.222% | >> -0.492% to 0.933% | >> | org.renaissance.jdk.concurrent.JmhFutureGenetic.run | -1.880% | >> -2.438% to -1.343% | >> | org.renaissance.jdk.streams.JmhMnemonics.run | -0.500% | >> -1.032% to 0.089% | >> | org.renaissance.jdk.streams.JmhParMnemonics.run | -0.740% | >> -2.092% to 0.639% | >> | org.renaissance.jdk.streams.JmhScrabble.run | -0.031% | >> -0.353% to 0.310% | >> | org.renaissance.neo4j.JmhNeo4jAnalytics.run | -0.873% | >> -2.323% to 0.427% | >> | org.renaissance.rx.JmhRxScrabble.run | -0.512% | >> -1.121% to 0.049% | >> | org.renaissance.scala.dotty.JmhDotty.run | -0.219% | >> -1.108% to 0.708% | >> | org.renaissance.scala.sat.JmhScalaDoku.run | -2.750% | >> -6.426% to -0.827% | >> | org.renaissance.scala.stdlib.JmhScalaKmeans.run | 0.046% | >> -0.383% to 0.408% | >> | org.renaissance.scala.stm.JmhPhilosophers.run | 1.497% | >> -0.955% to 3.923% | >> | org.renaissance.scala.stm.JmhScalaStmBench7.run ... > > While the basic idea is a good one, I think there are better ways to do it. > > Consider fixing the number of instructions to two rather than three. Then a > post-call NOP can be either 'nop; movk/movz` or `b.nv; movk/adrp`. Or > something similar.. `b.nv` is more expensive, but should be rare. Thank you for the feedback, @theRealAph, I will be out of office until early January - I'm planning to respond in more detail and work on updating the PR when I am back. I agree a fixed-length post-call NOP sequence would be preferable and would make the implementation simpler. The `B.nv` might not be suitable in this case - I believe the branch will be "always taken". However, I had considered using `CBNZ XZR, <#imm>`. So far, I avoided implementing it because it is unclear what performance effects this might have. While I agree that the case will be rare in terms of static occurrences - it might still impact performance if a particular instance is on a frequent execution path. I believe the change in C1 floating-point constants handling is a dependency for the current implementation - it allows keeping the constants section empty and ensures the compiler can determine the offset to the code blob header before the code buffer is finalized. However, if the approach is changed to a fixed post-call NOP sequence length, there will be no dependency between these two optimizations. I will revisit both the post-call NOP sequence structure and the dependency on floating-point constants handling when I return in January. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28855#issuecomment-3670298853
