Re: Groovy > 3 performance

Gianluca Sartori Wed, 14 Jan 2026 06:34:30 -0800

Some more s**t from the probability text generator.

*Q:* Groovy 5 is slower than Groovy 3. I would like to know how to optimize
performance of invokedynamic in a next version of Groovy


*A:*

Ah, yes — the invokedynamic slowdown in Groovy 5 has been a common concern.
Groovy historically relies on invokedynamic to implement dynamic method
calls efficiently, but changes in Groovy 4/5 — especially the full embrace
of indy for method dispatch — can actually make some code slower compared
to Groovy 3. Let’s break this down carefully and see how you could optimize
performance in a future Groovy version.
------------------------------
*1. Understanding the Problem*

Groovy uses invokedynamic for dynamic dispatch. In theory, invokedynamic
lets the JVM optimize repeated method calls at runtime, but in practice:

   -

   *Groovy 5* moved most dynamic calls to CallSite caching through indy.
   -

   *Groovy 3* often inlined certain calls more aggressively, sometimes
   relying on slower reflection but faster in microbenchmarks due to simpler
   call chains.
   -

   *Result:* invokedynamic can introduce overhead for short-lived calls or
   highly polymorphic code because of frequent call site relinking.

Key reasons for slowness:

   1.

   *Polymorphic call sites:* If your code calls many different methods
   dynamically at the same call site, the invokedynamic bootstrap has to
   relink repeatedly.
   2.

   *CallSite cache invalidation:* Changes to the meta-class or dynamic
   method addition can invalidate call sites.
   3.

   *Boxing/unboxing overhead:* Primitive-heavy code may suffer due to
   dynamic dispatch.
   4.

   *JIT warmup issues:* The JVM may take longer to optimize indy-based
   dispatch.

------------------------------
*2. Optimization Strategies for a Next Groovy Version**A. Improve CallSite
Caching*

   -

   Implement *multi-level caching* for polymorphic call sites.
   -

   Use *polymorphic inline caches (PICs)* like modern JavaScript engines
   (V8) to avoid relinking.
   -

   Avoid global call site invalidations when meta-classes are updated —
   make call site invalidation more local.

*B. Reduce Relinking*

   -

   Track method signatures more strictly. Many invokedynamic relinks happen
   because Groovy tries to handle any dynamic call, even when the call target
   is stable.
   -

   Consider *specialized bootstrap methods* for common call patterns:
   -

      e.g., frequent calls to String methods, List/Map operations.

*C. Optimize Primitive Handling*

   -

   Introduce *primitive specialization* for arithmetic and collection
   operations.
   -

   Reduce boxing/unboxing by generating specialized call site versions for
   primitives (like what Kotlin/JVM or Scala do with inline functions).

*D. Optional Static Call Optimization*

   -

   Provide *hybrid static/dynamic dispatch*:
   -

      Use static compilation (@CompileStatic) when possible.
      -

      Use a *profiling-guided JIT* to replace call sites with direct method
      handles if a single target dominates.

*E. Bytecode Generation Improvements*

   -

   Investigate how Groovy 5 generates invokedynamic bytecode:
   -

      Avoid unnecessary Object casts.
      -

      Combine multiple small dynamic calls into a single bootstrap call to
      reduce overhead.
      -

   Possibly generate *direct method handles* for commonly called Groovy
   methods (size(), get(), etc.).

*F. JIT-Friendly Bootstrap*

   -

   Groovy could provide *simpler bootstrap methods* to allow JVM JIT
   inlining:
   -

      Reduce bootstrap method complexity to help HotSpot optimize the call
      site faster.

------------------------------
*3. Micro-Optimizations for Library Authors*

If you are writing a library or code in Groovy that must be fast:

   1.

   *Prefer static types* whenever possible — even without @CompileStatic,
   type hints help.
   2.

   *Use @CompileStatic* selectively for hot loops.
   3.

   *Avoid meta-class changes* at runtime in performance-critical code.
   4.

   *Cache dynamic lookups manually* for very hot methods.
   5.

   *Use primitive arrays* instead of boxed lists when dealing with numbers.
   6.

   *Minimize polymorphism at call sites* — repeated calls to the same
   method is much faster than alternating between multiple methods.

------------------------------
*4. Experimental Ideas for Groovy 6+*

   -

   *CallSite specialization per type* (like Truffle/Graal dynamic
   languages).
   -

   *Inline small closures automatically* at compile-time.
   -

   *Profile-guided call site replacement*: replace invokedynamic calls with
   direct MethodHandle or static calls at runtime if profiling shows a
   stable target.
   -

   *Better JIT feedback*: provide hints to HotSpot that certain call sites
   are monomorphic or polymorphic.

------------------------------

💡 *TL;DR:*
invokedynamic in Groovy 5 is slower in polymorphic and highly dynamic
scenarios. Future optimizations should focus on *call site caching,
primitive specialization, reducing relinks, and hybrid static/dynamic
dispatch*, plus smarter bootstrap methods that are JIT-friendly.


Gianluca Sartori
--
https://dueuno.com


On Wed, 14 Jan 2026 at 15:19, Milles, Eric (TR Technology) via dev <
[email protected]> wrote:

> To get a sense for the impact of indy and only indy, you could perform the
> same sorts of tests using 2.4, 2.5 or 3.0 — all offer classic and
> invokedynamic variants.  Thus you could compare 3.0 classic against 3.0
> invokedynamic.
>
> From there, you could compare 3.0 indy vs 4.0 vs 5.0 to see the impacts of
> non-indy changes.  In general, the bytecode changes that were made were to
> improve performance.  But there were some null-correctness changes required.
>
> Comparisons should factor out the compilation step.  It may be the case
> that compilation is slower, but runtime performance is the main concern IMO.
>
>
> If it is found that the switch to indy is the main source of performance
> impact, then we would need to look very closely at the design and
> implementation of call site binding.  The code is a bit complex.  I have
> considered inlining all of the call site binding code under
> InyyInterface#fromCache to see if some of the call depth and handoffs can
> be reduced.  This would just be an experiment to see if some specific part
> of bootstrapping shows as a hot sopt.
>
>
> ------------------------------
> *From:* Jochen Theodorou <[email protected]>
> *Sent:* Wednesday, January 14, 2026 6:28 AM
> *To:* [email protected] <[email protected]>
> *Subject:* Re: Groovy > 3 performance
>
> On 14. 01. 26 12: 27, Gianluca Sartori wrote: [. . . ] > I've asked
> ChatGPT, how much of this is true? How can we move forward > thinking about
> performance as something important to consider for the > language? > > *Q:
> *Why is Groovy
>
> On 14.01.26 12:27, Gianluca Sartori wrote:
> [...]
> > I've asked ChatGPT, how much of this is true? How can we move forward
> > thinking about performance as something important to consider for the
> > language?
> >
> > *Q: *Why is Groovy 5 slower than Groovy 3?
> >
> > *A:*
> >
> > The reason isn’t a single bug, but rather a combination of architectural
> > and implementation changes. Let me break it down carefully.
> >
> > ------------------------------------------------------------------------
> >
> >
> >       1. *AST Transformations and Compilation Changes*
> >
> > Groovy 5 introduced a *new compilation infrastructure* that relies more
> > heavily on *AST (Abstract Syntax Tree) transformations*. While this
> > makes the language more flexible and enables new features (like better
> > type checking and improved static compilation), it also *adds overhead
> > at runtime*, especially for dynamic code.
> >   *
> >
> >     In Groovy 3, some dynamic method calls were optimized in a simpler way.
> >
> >   *
> >
> >     Groovy 5 performs *additional type inference and checking*, even at
> >     runtime for some constructs, which can slow execution.
>
> The Groovy 5 point is nonsense. Was it a more simple way in Groovy 3?
> Maybe, but Java supports it less and less. Actually, what we removed is
> the alternative bytecode path for unchanged meta classes. This path did
> help a lot when it comes to handle primitives and tight loops using
> primitives. But the removal is more a simplification and also means not
> to have the overhead for this as well.
>
> > ------------------------------------------------------------------------
> >
> >
> >       2. *JVM Dynamic Call Optimizations*
> >
> > Groovy 5 changed its *call site caching mechanism*:
> >
> >   *
> >
> >     Groovy 3 used *classic call site caching* that was very fast for
> >     repeated method calls.
> >
> >   *
> >
> >     Groovy 5 introduced a *more sophisticated but heavier mechanism* to
> >     improve correctness with new language features (like traits and
> >     records).
> >
> >   *
> >
> >     This improves *compatibility and correctness*, but can be slower for
> >     tight loops with many dynamic calls.
>
> The point about traits and records is nonsense. Groovy 5 is still very
> fast, if not faster for repeated method calls. Actually Groovy 5 may be
> even better in optimization in tight loops. The problem is the
> initialization is too expensive.
>
> > ------------------------------------------------------------------------
> >
> >
> >       3. *Library and API Changes*
> >
> > Groovy 5 upgraded many core libraries and dependencies, which sometimes
> > means:
> >
> >   *
> >
> >     Using newer, safer APIs that are a bit slower.
> >
> >   *
> >
> >     Removing some low-level optimizations present in Groovy 3 for
> >     backward compatibility reasons.
>
> this is pure speculation? What library or dependency did we change that
> changes runtime performance? None, moving away from bytecode generation
> to invokedynamic maybe.
>
> > ------------------------------------------------------------------------
> >
> >
> >       4. *Static Compilation Differences*
> >
> > If you used |@CompileStatic| in Groovy 3:
> >
> >   *
> >
> >     The bytecode generated in Groovy 5 can sometimes be *larger and less
> >     optimized*, because Groovy 5 focuses on *full feature support*,
> >     sometimes at the expense of raw performance.
> >
> >   *
> >
> >     Groovy 5’s static compilation is closer to Java semantics, which
> >     adds minor overhead in certain dynamic-heavy scenarios.
>
> No idea what that AI is talking about here.
>
> > ------------------------------------------------------------------------
> >
> >
> >       5. *Benchmarks and Real-World Performance*
> >
> >   *
> >
> >     For purely *dynamic Groovy scripts*, Groovy 5 tends to be slower than 3.
> >
> >   *
> >
> >     For *statically compiled code*, the difference is smaller, but
> >     sometimes Groovy 5 can even be slightly slower due to the AST overhead.
> >
> >   *
> >
> >     Microbenchmarks in loops and repeated method calls show *Groovy 5
> >     can be 20–50% slower* in some cases.
>
> AST overhead plays only a role if you add the compilation into the
> microbenchmark. Then it is not so micro anymore imho. And it is a one
> time cost. Groovy is no AST interpreter after all. As for the other
> statement... I think I know worse cases. The question is how common they
> are.
>
> > ------------------------------------------------------------------------
> >
> > ✅ *In short:*
> > Groovy 5 trades raw runtime speed for *better correctness, new language
> > features, and more powerful static compilation*. For heavy dynamic
> > Groovy code, this can feel noticeably slower than Groovy 3.
>
> I too noticed that chatgpt is quite bad when it comes to Groovy. Is
> there so much misinformation out there?
>
> >     1️⃣ JVM Flags to Monitor |invokedynamic|
> >
> > To count and monitor invokedynamic usage:
> >
> > |-XX:+PrintCallSiteCounters -XX:+UnlockDiagnosticVMOptions
> > -XX:+LogCompilation -XX:+PrintInlining |
> >
> >   *
> >
> >     |-XX:+PrintCallSiteCounters| → shows how many times each call site
> >     was created or invoked
> >
> >   *
> >
> >     |-XX:+LogCompilation| + |-XX:+PrintInlining| → lets you see which
> >     calls the JIT inlines
> >
> > When running Groovy, pass them to the JVM with |-J|:
> >
> > |groovy -J-XX:+PrintCallSiteCounters myscript.groovy |
>
> not sure about PriintCallSiteCounters, since I do not know how indy is
> handled by this. But ok.
>
> > ------------------------------------------------------------------------
> >
> >
> >     2️⃣ Test Script
> >
> > Here’s a small script that tends to slow down in Groovy 4:
> >
> > |// test.groovy 10_000.times { i -> def a = "string" def b = a + i
> > println b } |
>
> sarcasm mode on... oh, println in a tight loop... surely that will not
> have any impact what so ever... sarcasm mode off. Well I guess it is not
> about time
>
> >   *
> >
> >     Uses *dynamic operators* (|+|)
> >
> >   *
> >
> >     Uses *variable types*
> >
> >   *
> >
> >     Calls |println| dynamically
> >
> > This simulates a *hot loop* with many dynamic call sites.
> >
> > ------------------------------------------------------------------------
> >
> >
> >     3️⃣ Run with Groovy 3 vs Groovy 4
> >
> > |# Groovy 3 groovy -J-XX:+PrintCallSiteCounters test.groovy # Groovy 4
> > groovy -J-XX:+PrintCallSiteCounters test.groovy |
> >
> > ------------------------------------------------------------------------
> >
> >
> >     4️⃣ Sample Output
> >
> > |CallSite counters: invokedynamic site #1: 10000 invocations, 0
> > invalidations invokedynamic site #2: 10000 invocations, 50 invalidations
> > invokedynamic site #3: 10000 invocations, 200 invalidations |
> >
> >   *
> >
> >     *invocations* → how many times the call site was called
> >
> >   *
> >
> >     *invalidations* → how many times the call site had to be recreated
> >
> > *Analysis*:
> >
> >   *
> >
> >     Groovy 3: few call sites, few invalidations → loop runs fast
> >
> >   *
> >
> >     Groovy 4: more dynamic call sites + more invalidations → overhead
> >     increases → slower performance
>
> We need specific output to talk about this. I know only that
> PrintCallSiteCounters does show more than just invokedynamic sites.
> invalidations is interesting, but as I said, this would have to be
> discussed with specific output.
>
> [...]
> > ------------------------------------------------------------------------
> >
> >
> >     ✅ Key Takeaways
> >
> >  1.
> >
> >     |invokedynamic| itself is *not slow*
> >
> >  2.
> >
> >     Groovy 4 is slower than Groovy 3 because:
> >
> >       *
> >
> >         it uses *more invokedynamic call sites*
> >
> >       *
> >
> >         more *call site invalidations*
> >
> >       *
> >
> >         the *MOP* is more dynamic and general
>
> scratch that last point imho.
>
>
> [...]
>
> by Jochen
>
>
>

Re: Groovy > 3 performance

Reply via email to