Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases

Claes Redestad Fri, 12 Apr 2024 08:02:43 -0700

On Fri, 12 Apr 2024 14:53:58 GMT, Chen Liang <[email protected]> wrote:


>> This patch suggests a workaround to an issue with huge SCF MH expression 
>> trees taking excessive JIT compilation resources by reviving (part of) the 
>> simple bytecode-generating strategy that was originally available as an 
>> all-or-nothing strategy choice. 
>> 
>> Instead of reintroducing a binary strategy choice I propose a threshold 
>> parameter, controlled by 
>> `-Djava.lang.invoke.StringConcat.highArityThreshold=<val>`: For expressions 
>> below or at this threshold there's no change, for expressions with an arity 
>> above it we use the `StringBuilder`-chain bytecode generator. 
>> 
>> There are a few trade-offs at play here which influence the choice of 
>> threshold. The simple high arity strategy will for example not see any reuse 
>> of LambdaForms but strictly always generate a class per indy callsite, which 
>> means we might end up with a higher total number of classes generated and 
>> loaded in applications if we set this value too low. It may also produce 
>> worse performance on average. On the other hand there is the observed 
>> increase in C2 resource usage as expressions grow unwieldy. On the other 
>> other hand high arity expressions are likely rare to begin with, with less 
>> opportunities for sharing than the more common low-arity expressions. 
>> 
>> I turned the submitted test case into a few JMH benchmarks and did some 
>> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`:
>> 
>> Baseline strategy:
>> 13 args: 6.3M
>> 23 args: 18M
>> 123 args: 868M
>> 
>> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`:
>> 13 args: 2.11M
>> 23 args: 3.67M
>> 123 args: 4.75M
>> 
>> For 123 args the memory overhead of the baseline strategy is 180x, but for 
>> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 
>> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) 
>> I've conservatively chosen a threshold at arity 20. This keeps C2 resource 
>> pressure at a reasonable level (< 18M) while avoiding perturbing performance 
>> at the vast majority of call sites.
>> 
>> I was asked to use the new class file API for mainline. There's a version of 
>> this patch implemented using ASM in 7c52a9f which might be a reasonable 
>> basis for a backport.
>
> src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 
> 1437:
> 
>> 1435:                         for (int c = 0; c < args.parameterCount(); 
>> c++) {
>> 1436:                             if (constants[c] != null) {
>> 1437:                                 cb.ldc(constants[c]);
> 
> I think for Remi's approach, we change:
> 1. Insert an extra String array (maybe need a way to mark it stable?) arg 
> representing constants
> 2. Change this ldc into aload + aaload (or List.get if we use immutable List)
> 3. Call `bindTo(constantStrings)` on the resulting MH
> 
> This approach can significantly reduce the number of classes spinned instead 
> of generating one class per constant array; might need to measure performance 
> to see if this is a good tradeoff
> 
> Oh, I just noticed that we need to erase everything to the generic method 
> type. I think Remi's "value class" means there's no overhead for converting 
> primitives into wrappers in this conversion to generic method type.

I'd prefer considering such optimizations as follow-ups. We need to think about 
where to define such shared classes in a way that considers the full lifecycle, 
facilitates class unloading (one cache per classloader?) while still getting 
good reuse.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562673638

Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases

Reply via email to