On Fri, 12 Apr 2024 03:16:58 GMT, Brett Okken <d...@openjdk.org> wrote:
>> This patch suggests a workaround to an issue with huge SCF MH expression >> trees taking excessive JIT compilation resources by reviving (part of) the >> simple bytecode-generating strategy that was originally available as an >> all-or-nothing strategy choice. >> >> Instead of reintroducing a binary strategy choice I propose a threshold >> parameter, controlled by >> `-Djava.lang.invoke.StringConcat.highArityThreshold=<val>`: For expressions >> below or at this threshold there's no change, for expressions with an arity >> above it we use the `StringBuilder`-chain bytecode generator. >> >> There are a few trade-offs at play here which influence the choice of >> threshold. The simple high arity strategy will for example not see any reuse >> of LambdaForms but strictly always generate a class per indy callsite, which >> means we might end up with a higher total number of classes generated and >> loaded in applications if we set this value too low. It may also produce >> worse performance on average. On the other hand there is the observed >> increase in C2 resource usage as expressions grow unwieldy. On the other >> other hand high arity expressions are likely rare to begin with, with less >> opportunities for sharing than the more common low-arity expressions. >> >> I turned the submitted test case into a few JMH benchmarks and did some >> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: >> >> Baseline strategy: >> 13 args: 6.3M >> 23 args: 18M >> 123 args: 868M >> >> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: >> 13 args: 2.11M >> 23 args: 3.67M >> 123 args: 4.75M >> >> For 123 args the memory overhead of the baseline strategy is 180x, but for >> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 >> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) >> I've conservatively chosen a threshold at arity 20. This keeps C2 resource >> pressure at a reasonable level (< 18M) while avoiding perturbing performance >> at the vast majority of call sites. >> >> I was asked to use the new class file API for mainline. There's a version of >> this patch implemented using ASM in 7c52a9f which might be a reasonable >> basis for a backport. > > src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line > 1430: > >> 1428: cb.new_(STRING_BUILDER); >> 1429: cb.dup(); >> 1430: cb.invokespecial(STRING_BUILDER, "<init>", >> MethodTypeDesc.ofDescriptor("()V")); > > Would there be value in initializing to a larger capacity? Given the number > of append calls, seems the default cap of 16 is unlikely to be sufficient. Possibly. I tried a few simple variants that initialized the `StringBuilder` capacity at various guesses, such as sum of constant sizes + some factor based on args, but across a diverse set of micros that gives both some wins and some regressions. Getting the estimation just right is pretty tricky, especially when size of the arguments are arbitrary (like for any String/Object arg). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562356874