On Tue, 9 Apr 2024 12:01:49 GMT, Claes Redestad <[email protected]> wrote:
> This patch suggests a workaround to an issue with huge SCF MH expression
> trees taking excessive JIT compilation resources by reviving (part of) the
> simple bytecode-generating strategy that was originally available as an
> all-or-nothing strategy choice.
>
> Instead of reintroducing a binary strategy choice I propose a threshold
> parameter, controlled by
> `-Djava.lang.invoke.StringConcat.highArityThreshold=<val>`: For expressions
> below or at this threshold there's no change, for expressions with an arity
> above it we use the `StringBuilder`-chain bytecode generator.
>
> There are a few trade-offs at play here which influence the choice of
> threshold. The simple high arity strategy will for example not see any reuse
> of LambdaForms but strictly always generate a class per indy callsite, which
> means we might end up with a higher total number of classes generated and
> loaded in applications if we set this value too low. It may also produce
> worse performance on average. On the other hand there is the observed
> increase in C2 resource usage as expressions grow unwieldy. On the other
> other hand high arity expressions are likely rare to begin with, with less
> opportunities for sharing than the more common low-arity expressions.
>
> I turned the submitted test case into a few JMH benchmarks and did some
> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`:
>
> Baseline strategy:
> 13 args: 6.3M
> 23 args: 18M
> 123 args: 868M
>
> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`:
> 13 args: 2.11M
> 23 args: 3.67M
> 123 args: 4.75M
>
> For 123 args the memory overhead of the baseline strategy is 180x, but for 23
> args we're down to a 5x memory overhead, and down to a 3x overhead for 13
> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb)
> I've conservatively chosen a threshold at arity 20. This keeps C2 resource
> pressure at a reasonable level (< 18M) while avoiding perturbing performance
> at the vast majority of call sites.
>
> I was asked to use the new class file API for mainline. There's a version of
> this patch implemented using ASM in 7c52a9f which might be a reasonable basis
> for a backport.
src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1430:
> 1428: cb.new_(STRING_BUILDER);
> 1429: cb.dup();
> 1430: cb.invokespecial(STRING_BUILDER, "<init>",
> MethodTypeDesc.ofDescriptor("()V"));
Would there be value in initializing to a larger capacity? Given the number of
append calls, seems the default cap of 16 is unlikely to be sufficient.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1561966067