On 14.01.26 15:32, Gianluca Sartori wrote:
[...]
  *

    *Groovy 5* moved most dynamic calls to |CallSite| caching through
    |indy|.

correct, though it was before in the equivalent code for the bytecode generation

  *

    *Groovy 3* often inlined certain calls more aggressively, sometimes
    relying on slower reflection but faster in microbenchmarks due to
    simpler call chains.

which type of calls though? It was faster in some micro-benchmarks because we had a full primitive path parallel to the non-primitive path maybe. This worked only with unchanged meta classes, so not sure Grails had much of it. Which also shows a big difference, a micro-benchmark is not an app and you have to know what you actually test

  *

    *Result:* |invokedynamic| can introduce overhead for short-lived
    calls or highly polymorphic code because of frequent call site
    relinking.

that would have impacted performance on non-indy as well


Key reasons for slowness:

 1.

    *Polymorphic call sites:* If your code calls many different methods
    dynamically at the same call site, the |invokedynamic| bootstrap has
    to relink repeatedly.

 2.

    *CallSite cache invalidation:* Changes to the meta-class or dynamic
    method addition can invalidate call sites.

as I said, that counts for both variants.

 3.

    *Boxing/unboxing overhead:* Primitive-heavy code may suffer due to
    dynamic dispatch.

well, that is not supposed to be an issue in indy. It was in the old callsite code in Groovy 3
 4.

    *JIT warmup issues:* The JVM may take longer to optimize
    |indy|-based dispatch.

that is actually a factor. But a proper micro-benchmark will not measure warmup times, right?

------------------------------------------------------------------------


      *2. Optimization Strategies for a Next Groovy Version*


        *A. Improve CallSite Caching*

  *

    Implement *multi-level caching* for polymorphic call sites.

  *

    Use *polymorphic inline caches (PICs)* like modern JavaScript
    engines (V8) to avoid relinking.

basically agree

  *

    Avoid global call site invalidations when meta-classes are updated —
    make call site invalidation more local.

how?

        *B. Reduce Relinking*

  *

    Track method signatures more strictly. Many |invokedynamic| relinks
    happen because Groovy tries to handle any dynamic call, even when
    the call target is stable.

True. If you call foo(Object) with a String and then with an Integer it will cause relinking, even though it is not required

  *

    Consider *specialized bootstrap methods* for common call patterns:

      o

        e.g., frequent calls to |String| methods, |List|/|Map| operations.

on what are we actually saving here?

        *C. Optimize Primitive Handling*

  *

    Introduce *primitive specialization* for arithmetic and collection
    operations.

  *

    Reduce boxing/unboxing by generating specialized call site versions
    for primitives (like what Kotlin/JVM or Scala do with inline functions).

we have

        *D. Optional Static Call Optimization*

  *

    Provide *hybrid static/dynamic dispatch*:

      o

        Use static compilation (|@CompileStatic|) when possible.

on the compiler side directly? Well... when is it possible?

      o

        Use a *profiling-guided JIT* to replace call sites with direct
        method handles if a single target dominates.

well.. and how does the invalidation work if the target is suddenly incorrect?

        *E. Bytecode Generation Improvements*

  *

    Investigate how Groovy 5 generates |invokedynamic| bytecode:

      o

        Avoid unnecessary |Object| casts.

      o

        Combine multiple small dynamic calls into a single bootstrap
        call to reduce overhead.

Have that afaik

  *

    Possibly generate *direct method handles* for commonly called Groovy
    methods (|size()|, |get()|, etc.).

see above

        *F. JIT-Friendly Bootstrap*

  *

    Groovy could provide *simpler bootstrap methods* to allow JVM JIT
    inlining:

      o

        Reduce bootstrap method complexity to help HotSpot optimize the
        call site faster.

not bootstrap complexity, but the resulting handle should have as little complexity in there as possible

------------------------------------------------------------------------


      *3. Micro-Optimizations for Library Authors*

If you are writing a library or code in Groovy that must be fast:

 1.

    *Prefer static types* whenever possible — even without
    |@CompileStatic|, type hints help.

which leads to more casting... so no.

 2.

    *Use |@CompileStatic|* selectively for hot loops.

possibly

 3.

    *Avoid meta-class changes* at runtime in performance-critical code.

well, they is currently no way to protect an area against meta class changes

 4.

    *Cache dynamic lookups manually* for very hot methods.

not understood

 5.

    *Use primitive arrays* instead of boxed lists when dealing with numbers.

possibly.

 6.

    *Minimize polymorphism at call sites* — repeated calls to the same
    method is much faster than alternating between multiple methods.

correct

------------------------------------------------------------------------


      *4. Experimental Ideas for Groovy 6+*

  *

    *CallSite specialization per type* (like Truffle/Graal dynamic
    languages).

I would like to make a MethodHandle part of the MetaMethod. I guess that goes in that direction.

  *

    *Inline small closures automatically* at compile-time.

It is not only the code of the Closure, it is also the code of the Closure handling method. in i=0; n.times{i++} it does not actually help to "inline" i++, because we would have to inline it to the times method. Instead we have to inline times as well of course with the danger, that if we change times, it will not be reflected without recompilation

  *

    *Profile-guided call site replacement*: replace |invokedynamic|
    calls with direct |MethodHandle| or static calls at runtime if
    profiling shows a stable target.

I cannot replace it with static calls

  *

    *Better JIT feedback*: provide hints to HotSpot that certain call
    sites are monomorphic or polymorphic.

How?

bye Jochen

Reply via email to