Re: [I] Groovy - invoke dynamic performance problems [grails-core]

via GitHub Wed, 07 Jan 2026 13:18:22 -0800


jamesfredley commented on issue #15293:
URL: https://github.com/apache/grails-core/issues/15293#issuecomment-3720854215


   I had Grok analyze those two 100MB logs.
   
   
   ## application startup and initial data loading log
   
   ### Analysis of the Indy Logging Output
   
   The provided log (from `performance-test-app.log`) captures verbose 
INFO-level output from Groovy's invokedynamic (indy) system during the 
execution of the `/performanceTest/benchmark` endpoint in the Grails 7 
performance regression app. This logging was enabled via 
`-Dgroovy.indy.logging=true` and `-Dgroovy.indy.debug=true`, as per your 
command. The log is extensive (truncated in the query but fully processed via 
tools), showing repeated patterns of indy bootstrap operations. Below, I break 
down key insights, why callsites remain uncached, and implications for 
performance.
   
   #### Key Statistics from the Log
   - **Total "call site stays uncached" occurrences**: 1497. This indicates 
that nearly every indy callsite bootstrapped during the test failed to cache, 
leading to repeated overhead.
   - **Invocation types**:
     - CAST: 1306 (dominant; these are type casts, e.g., casting domain objects 
to interfaces like `DirtyCheckable` for change tracking).
     - METHOD: 148 (method calls, e.g., `hasChanged`, `getEntry`, or 
GORM-specific methods like 
`org_grails_datastore_gorm_GormEntityDirtyCheckabletrait$super$hasChanged`).
   - **Guards added**: These are conditions checked before invoking the target. 
Common ones include:
     - `added meta class equality check` (checks if the metaclass hasn't 
changed; frequent due to Groovy's dynamic MOP).
     - `added switch point guard` (JVM-level invalidation hook for changes like 
class redefinition).
     - `added same-class argument check` (ensures argument types match expected 
classes).
     - `added GroovyRuntimeException unwrapper` (handles exception wrapping in 
Groovy calls).
     - Less common: `added class equality check`, `added null argument check at 
pos 0`, `added category-in-current-thread-guard for category method` (for 
thread-local categories).
   - **Target types**: Primarily GORM-related, such as:
     - `(DirtyCheckable)FieldHelper` (most common; for dirty checking in domain 
objects).
     - `(Object)DirtyCheckable`, `(GormValidateable)FieldHelper`, 
`(Validator)boolean`, `(Object)Constraint`.
     - Others like `(Object)GormInstanceApi`, `(Object)Validator`, 
`(Object)Errors`, reflecting validation, persistence, and trait access.
   - **Argument classes**: Focused on domain entities and GORM internals:
     - Domain: `grails7.performance.Company`, `Department`, `Employee`, 
`Project`, `Task`, `Milestone`.
     - GORM/Hibernate: `org.grails.orm.hibernate.HibernateGormInstanceApi`, 
`org.grails.datastore.gorm.validation.constraints.*` (e.g., `BlankConstraint`, 
`MaxSizeConstraint`), 
`org.hibernate.engine.internal.StatefulPersistenceContext`, 
`org.grails.datastore.mapping.validation.ValidationErrors`.
   
   The log starts at app startup/boot but explodes during the benchmark loop, 
where GORM operations (e.g., creating/saving/querying interconnected domains 
like Company → Department → Employee → Project → Task → Milestone) trigger 
thousands of dynamic dispatches.
   
   #### Why Do Call Sites Stay Uncached?
   In Groovy 4's indy system, callsites are bootstrapped the first time an 
invocation occurs (via `IndyInterface.bootstrap`). Guards are added to allow 
caching if conditions (e.g., types, metaclasses) remain stable. However, if the 
selector (in `org.codehaus.groovy.vmplugin.v8.Selector$MethodSelector`) 
determines the callsite is too volatile, it "stays uncached," falling back to 
full method selection on every call—similar to legacy call site behavior but 
without the legacy caching benefits.
   
   Reasons for uncaching based on this log:
   - **High polymorphism and dynamic behavior**: GORM domains use traits (e.g., 
`DirtyCheckable`, `GormEntity`) that involve dynamic method resolution. Each 
new instance (created per benchmark iteration) may have slight variations in 
metaclasses or state, triggering guards like "meta class equality check." If 
polymorphism exceeds thresholds (e.g., multiple subtypes or changes), caching 
is skipped to avoid constant invalidations.
   - **CAST-heavy workload**: 87% of invocations are casts (e.g., to 
`DirtyCheckable` for change detection during saves). These are common in GORM 
for property access/validation but aren't cacheable if types vary or if they're 
one-off (low invocation count per unique site).
   - **No optimizations triggered**: With your potential tunings (e.g., low 
`groovy.indy.optimize.threshold`), sites should optimize after a few calls, but 
the log shows no "call site optimized" or caching success. This suggests 
inherent volatility—e.g., metaclass modifications from GORM plugins or 
categories.
   - **Benchmark design**: The test creates fresh objects each iteration (e.g., 
new `Task@...`, `Company@...`), leading to new callsites per run. If 
types/arguments differ slightly, guards fail, preventing caching.
   - **Source code insight**: From Groovy codebase (`IndyInterface.java`), 
uncaching occurs if `doCallSiteTargetSet` decides the target isn't stable 
(e.g., abstract methods, categories, or high fallback risk). Here, many targets 
are abstract/interfaces (e.g., `PersistenceContext.getEntry`), which are 
dynamic.
   
   This uncaching is a key contributor to the 2-4x slowdown in Grails 7 (indy) 
vs. legacy modes, as each invocation incurs bootstrap-like overhead without JVM 
JIT benefits from repeated stable calls.
   
   #### Performance Implications
   - **Overhead**: Uncached sites mean slower dispatch (full MOP lookup each 
time) vs. legacy's pre-cached call sites. In data-intensive loops like this 
benchmark (e.g., 50 iterations of GORM saves/queries), this amplifies 
regressions.
   - **Comparison to legacy**: Legacy Groovy used explicit call site objects 
with caching; indy relies on JVM for optimization but fails here due to 
dynamics. Flame graphs from the original issue (#15293) show time in 
`vmplugin.v8` paths, matching this log.
   - **Why more logging than expected?**: You mentioned limited output 
initially, but this log is bursty—startup has some, but the test triggers 
thousands as new invocations hit. If you ran with default warmup/iterations, 
expect ~1.3k CASTs alone.
   
   #### Recommendations for Improvements
   To make indy faster (approaching legacy):
   1. **Static compilation**: Add `@CompileStatic` to domain classes (e.g., 
`Company.groovy`), the `PerformanceTestController`, or services. This bypasses 
dynamic dispatch for casts/methods, reducing indy reliance.
   2. **Tune thresholds aggressively**: Set `-Dgroovy.indy.optimize.threshold=0 
-Dgroovy.indy.fallback.threshold=0` to force early optimization and disable 
fallbacks. Rerun the benchmark and check if "stays uncached" decreases.
   3. **Disable indy**: For testing, add to `build.gradle`:
      ```
      tasks.withType(GroovyCompile) {
          groovyOptions.optimizationOptions.indy = false
      }
      ```
      This reverts to legacy bytecode, potentially resolving the regression (as 
per issue benchmarks).
   4. **Codebase tweaks in Groovy**: Easy spots (from earlier analysis):
      - In `IndyInterface.java`, lower hardcoded thresholds or add more 
adaptive caching for CASTs.
      - Optimize `linkCallSite` for GORM-like workloads by pre-caching common 
trait interfaces.
      - Contribute to GROOVY JIRA (e.g., extend GROOVY-8298 fixes).
   5. **Diagnostics**: Run with flame graphs (e.g., via async-profiler) to 
confirm indy overhead. If using low thresholds, monitor for invalidations (not 
logged here, but possible with custom logging).
   
   If you share more details (e.g., full command, iteration count, or flame 
graphs), I can refine further. This log confirms indy isn't caching effectively 
in this GORM-heavy scenario, but the tunables and static compilation can 
mitigate it.
   
   ## log while running localhost:8080/performanceTest/benchmark?pretty=true
   
   ### Analysis of the New Indy Logging Output
   
   This log (from `performance-test-app.log`) captures an even larger burst of 
indy activity during the `/performanceTest/benchmark` endpoint execution, with 
a total of ~243,549 lines (heavily repetitive). It focuses almost exclusively 
on a single pattern in a closure 
(`grails7.performance.PerformanceTestService$_getProjectMetrics_closure12$_closure61`),
 likely part of a metrics calculation loop over `Task` domain objects in the 
benchmark. Unlike the previous log (dominated by CASTs for dirty 
checking/validation), this one highlights property access and type coercion in 
a data-processing context.
   
   #### Key Statistics from the Log
   - **Total lines**: 243,549 (indicating high verbosity and many iterations; 
the benchmark likely ran with higher iterations, e.g., 50+).
   - **"call site stays uncached" occurrences**: 10,589 (every bootstrapped 
callsite remains uncached, causing repeated overhead).
   - **Invocation types**:
     - GET: 8,129 (primarily for the `estimatedHours` property on `Task` 
objects).
     - CAST: 2,460 (exclusively casting the returned `Integer` from 
`estimatedHours` to `boolean`, likely for a conditional like `if 
(task.estimatedHours)`).
   - **Specific patterns**:
     - `estimatedHours` GET invocations: 2,461 (one extra possibly from an edge 
case).
     - Boolean CASTs: 2,460 (matches CAST count; each GET is followed by a CAST 
to boolean via DefaultGroovyMethods `asBoolean` on `Number`).
   - **Unique `Task` instances**: 75 (accesses are repeated across a small set 
of objects, but the dynamic nature prevents reuse).
   - **Guards added** (common checks before invocation; counts reflect 
repetition):
     - GroovyRuntimeException (unwrapper): 10,589
     - switch (switch point for invalidations): 10,589
     - same-class (argument class check): 10,589
     - meta (meta class equality check): 8,128 (mostly on GETs for `Task`, 
where metaclasses are dynamic).
   
   The log spans ~11 seconds (17:47:50 to 17:48:01), with bursts corresponding 
to loop iterations processing `Task` objects (e.g., summing hours for project 
metrics). The sender is a closure in `PerformanceTestService`, suggesting code 
like:
   ```groovy
   // Hypothetical snippet from getProjectMetrics
   tasks.each { task ->
       if (task.estimatedHours) { // Triggers GET + CAST to boolean
           // Accumulate metrics
       }
   }
   ```
   
   #### Why Do Call Sites Stay Uncached?
   Based on the Groovy 4 source code (`Selector.java` and 
`IndyInterface.java`), callsites remain uncached due to a deliberate decision 
in `MethodSelector.doCallSiteTargetSet()`:
   - The `cache` flag is set to `false` if:
     - `spreadCall` is true (argument spreading, e.g., `list*.method()`—not the 
case here, as `spreadCall: false`).
     - The receiver's class has per-instance metaclasses 
(`ClassInfo.getClassInfo(receiver.getClass()).hasPerInstanceMetaClasses()` 
returns true).
   - For the GET on `estimatedHours`:
     - Receiver is `grails7.performance.Task` (a GORM domain class).
     - GORM enhances domains with dynamic traits and methods (e.g., via 
`ExpandoMetaClass` or plugins), marking them as having per-instance 
metaclasses. This makes caching unsafe, as metaclasses could change per object 
instance, leading to invalidations.
     - Meta class in log: `groovy.lang.MetaClassImpl` (standard, but the class 
info flags it as per-instance due to GORM's runtime modifications).
   - For the CAST to boolean:
     - Receiver is `java.lang.Integer` (stable JDK class, no per-instance 
metaclasses).
     - However, it's using a DefaultGroovyMethods (DGM) helper (`asBoolean` on 
`Number`), which is dynamic. The uncaching cascades from the context or is 
treated similarly due to the enclosing dynamic dispatch.
   - Other factors from source:
     - No abstract methods here (`estimatedHours` is a concrete getter; log 
shows "meta method is CachedMethod instance" and "successfully unreflected 
method").
     - No category methods (log: "meta method is category type method: false").
     - Hit counts aren't triggering optimization (controlled by 
`-Dgroovy.indy.optimize.threshold`, default 10,000)—but since `cache=false` 
initially, it never attempts to set the target.
     - Fallbacks aren't resetting here (no log of "call site target reset"), 
but high volatility could exceed `-Dgroovy.indy.fallback.threshold` (default 
10,000) in longer runs.
   
   This results in every invocation falling back to full method selection 
(heavyweight, blocking JVM threads as noted in related discussions), mimicking 
legacy behavior but without its caching efficiency. With 2,461+ accesses across 
only 75 unique `Task`s, stable caching could have reduced this to near-zero 
overhead after the first few calls.
   
   #### Performance Implications
   - **Regression source**: Uncached sites mean repeated bootstrapping/lookups, 
amplifying the 2-4x slowdown in GORM workloads. This log shows tighter focus on 
property access in loops, where indy should excel (via JVM JIT) but fails due 
to the per-instance metaclass flag.
   - **Comparison to previous log**: Earlier log had broader CASTs for dirty 
checking/validation (GORM internals); this is user-code level (service 
closure). Together, they highlight indy struggles with GORM's dynamism.
   - **Benchmark context**: The ~11-second span suggests the test loop (e.g., 
creating/querying tasks) is bottlenecked here. Flame graphs would show time in 
`vmplugin.v8.Selector` and MOP lookups.
   
   #### Easy-to-Spot Improvements in Groovy 4 Codebase
   From source analysis:
   - **Enhance caching for per-instance metaclasses**: In `Selector.java`, the 
`cache &= !hasPerInstanceMetaClasses` is conservative. Add an optional guard 
(e.g., instance-specific metaclass check) or a property 
`-Dgroovy.indy.cache.perinstance=true` to enable caching with extra safeguards. 
This could help GORM-like apps without breaking dynamics.
   - **Lower defaults for thresholds**: Hardcoded defaults (10,000 for 
optimize/fallback) are high for low-frequency calls. Make them configurable 
per-class or lower to 100-1,000 in `IndyInterface.java` for better out-of-box 
performance.
   - **Special handling for DGM casts**: For common CASTs like `asBoolean` on 
primitives, bypass uncaching if the receiver is a JDK class (e.g., add `if 
(receiver instanceof Number && !hasPerInstanceMetaClasses)` in 
`MethodSelector`).
   - **Logging enhancements**: Add reasons to the "call site stays uncached" 
log (e.g., "due to per-instance metaclasses") for easier debugging.
   These are low-risk changes in `Selector.java` and `IndyInterface.java`, 
testable via benchmarks like this app.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Groovy - invoke dynamic performance problems [grails-core]

Reply via email to