Re: [I] Groovy - invoke dynamic performance problems [grails-core]

via GitHub Mon, 19 Jan 2026 11:28:13 -0800


jamesfredley commented on issue #15293:
URL: https://github.com/apache/grails-core/issues/15293#issuecomment-3769803396


   Classic Call Site (Non-Indy) - Key Characteristics
   1. Call site is replaced on first call (CallSiteArray.defaultCall → 
createCallSite → replaceCallSite)
      - The call site object in the array is replaced with a specialized 
version (e.g., PojoMetaMethodSite)
      - This specialized site caches: receiver class, metaclass, method, and 
expected parameter classes
   2. Guard check is simple inline code (see PojoMetaMethodSite.checkCall):
         return receiver.getClass() == metaClass.getTheClass()  // receiver 
class matches
          && checkPojoMetaClass()                            // metaclass 
version unchanged
          && MetaClassHelper.sameClasses(params, args);      // argument 
classes match
      
   3. On guard failure: falls back to CallSiteArray.defaultCall which will 
re-resolve and replace the call site again
   4. Direct method invocation: metaMethod.doMethodInvoke(receiver, args) or 
direct Method.invoke
   
   Indy Call Site - Key Differences
   1. Uses MutableCallSite with MethodHandle target - more complex structure
   2. Guards are MethodHandle chains: MethodHandles.guardWithTest(test, handle, 
fallback)
      - Each guard adds overhead in the method handle chain
   3. Multiple guards are chained: metaclass guard → switchpoint guard → 
argument type guards
   4. Cache is a LinkedHashMap with soft references - synchronized access
   5. On guard failure: goes through selectMethod which does full method 
resolution
   
   Key Performance Differences
   
   The classic approach has several advantages:
   
   1. Simple polymorphic dispatch: If receiver class changes, the site is just 
replaced with a new specialized one - no complex cache lookup
   2. No MethodHandle overhead: Direct virtual calls vs method handle chains
   3. Simpler guard checks: Inline if statements vs MethodHandles.guardWithTest 
chains
   4. Per-call-site specialization: Each call site becomes optimized for its 
most common case
   
   PojoMetaMethodSite.java
   When the guard fails (line 59), it calls CallSiteArray.defaultCall(this, 
receiver, args) which will:
   1. Call createCallSite 
   2. Create a NEW specialized call site for the new receiver/args types
   3. Replace the old call site in the array with the new one
   4. Invoke the new call site
   
   This means for polymorphic call sites, the classic approach constantly 
replaces the call site - it's monomorphic but "thrashes" between different 
specializations. This is actually less efficient than a proper polymorphic 
inline cache (PIC), but the overhead of thrashing is apparently lower than 
indy's guard-chain overhead!
   
   What makes indy's guards more expensive:
   IndyGuardsFiltersAndSignatures.java
   
   
   Summary: Why Non-Indy is 5-6x Faster
   
   Classic Call Site Approach (Non-Indy)
   ```
   ┌─────────────────────────────────────────────────────────────┐
   │                    CallSiteArray                             │
   │  ┌──────────┐ ┌──────────┐ ┌──────────┐                     │
   │  │CallSite 0│ │CallSite 1│ │CallSite 2│ ...                 │
   │  └────┬─────┘ └────┬─────┘ └────┬─────┘                     │
   │       │            │            │                            │
   │       ▼            ▼            ▼                            │
   │  Specialized  Specialized  Specialized                       │
   │  (replaced on (replaced on (replaced on                      │
   │   first call)  first call)  first call)                      │
   └─────────────────────────────────────────────────────────────┘
   ```
   
   Flow:
   1. call(receiver, args)
   2. checkCall() → simple inline if-checks
   3. If pass: invoke() → direct method call
   4. If fail: CallSiteArray.defaultCall() 
      → create NEW specialized site
      → REPLACE in array
      → invoke new site
   
   Key characteristics:
   - Monomorphic but adaptive: Each site is specialized for ONE type, replaced 
when type changes
   - Simple guard checks: Inline Java if statements
   - Direct invocation: metaMethod.invoke() or Method.invoke()
   - No method handle overhead: Pure virtual dispatch
   
   Indy Call Site Approach
   ```
   ┌─────────────────────────────────────────────────────────────┐
   │                  CacheableCallSite                           │
   │  target: MethodHandle (guarded chain)                        │
   │  fallbackTarget: MethodHandle → selectMethod                 │
   │  lruCache: LinkedHashMap<String, SoftRef<MHWrapper>>         │
   │  latestClassName: volatile String (fast-path)                │
   └─────────────────────────────────────────────────────────────┘
   ```
   Guard Chain:
   ```
   ┌──────────────────────────────────────────────────────────────┐
   │  guardWithTest(metaclassGuard,                               │
   │    guardWithTest(switchPointGuard,                           │
   │      guardWithTest(sameClassesGuard,                         │
   │        actualMethodHandle,                                   │
   │        fallback),                                            │
   │      fallback),                                              │
   │    fallback)                                                 │
   └──────────────────────────────────────────────────────────────┘
   ```
   
   Flow:
   1. fromCache(callSite, sender, methodName, ...)
   2. buildCacheKey(arguments) → String concatenation
   3. synchronized(lruCache) → cache lookup
   4. If miss: fallback() → full method resolution via Selector
   5. On cache hit: mhw.getDirectMethodHandle().invokeExact(args)
      → But guards still in chain!
   6. If guard fails: fallback to selectMethod()
   
   Key problems:
   1. Method handle chains are slow: Each guardWithTest adds overhead
   2. Cache lookup overhead: Even on fast-path, involves volatile reads and 
equals()
   3. String concatenation for cache keys: buildCacheKey() creates strings
   4. Synchronized map access: LRU cache requires synchronization
   5. Guard failures cascade: One failed guard triggers full fallback
   
   Why the Difference is So Large
   
   For polymorphic call sites (like collection operations on different domain 
types):
   ```
   | Aspect | Classic | Indy |
   |--------|---------|------|
   | Guard check | Inline if (class == expected) | MethodHandle chain traversal 
|
   | On guard fail | Replace site, immediate re-invoke | Full selectMethod() 
path |
   | Cache lookup | None (site IS the cache) | Map lookup + soft ref deref |
   | Method invoke | Direct virtual call | MethodHandle.invokeExact() |
   | Memory allocation | Minimal | String keys, MH adapters |
   ```
   The classic approach "thrashes" by constantly replacing the call site, but 
each replacement is cheap. The indy approach tries to be smarter with caching 
but the overhead of the mechanism exceeds the benefit.
   
   Potential Improvements for Indy
   1. Polymorphic inline cache (PIC): Instead of single guarded target, 
maintain multiple targets (like V8's PICs)
   2. Megamorphic fallback: After threshold, switch to unguarded dispatch via 
metaclass
   3. Simpler guards: Receiver-only guard for non-overloaded methods
   4. Remove guard chain: Use computed goto / tableswitch for type dispatch
   5. JIT-friendly patterns: Structure code to help HotSpot optimize


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Groovy - invoke dynamic performance problems [grails-core]

Reply via email to