On Tue, 7 Feb 2023 20:32:11 GMT, Claes Redestad <[email protected]> wrote:
>> src/java.base/share/classes/java/lang/String.java line 698:
>>
>>> 696: }
>>> 697:
>>> 698: static byte[] copyBytes(byte[] bytes, int offset, int length) {
>>
>> Given that the stub generated for array copy seems highly dependent by the
>> call site constrains, did you tried adding a check for offset == 0 and/or
>> length == bytes.length?
>>
>> If (offset == 0 && bytes.length == length) {
>> System.arrayCopy(bytes, 0, dst, 0, bytes.length);
>> // etc etc the other combinations
>>
>> This should have different generated stubs with much smaller ASM depending
>> by the enforced constrains (and shouldn't affect terribly the code size of
>> the method, given that the stub won't be inlined AFAIK)
>>
>> Beware, as noted by others, I'm not suggesting that's the way to fix this,
>> but it would be interesting to check how much perf we leave on the ground
>> due to the this supposed "inefficient" stub generation (if that's the issue).
>
> I did some quick experiments but saw no clear win from doing anything like
> this here. Feel free to experiment and see if there's some particular
> configuration that comes out ahead.
>
> FTR I did not intend for this RFE to solve
> https://bugs.openjdk.org/browse/JDK-8295496 completely, but provide a small,
> partial win that might possibly clear a path to solving that likely
> orthogonal issue.
I've created a separate benchmark for this (named as your by accident - given
that I've used it as a blueprint):
https://gist.github.com/franz1981/658c2bf6796aab4ae04a84bef1ef34b6
results are
Benchmark (offset) (size) Mode Cnt Score
Error Units
StringConstructor.arrayCopy 0 7 avgt 10 9.519 ±
0.131 ns/op
StringConstructor.arrayCopy 1 7 avgt 10 9.194 ±
0.232 ns/op
StringConstructor.copyOf 0 7 avgt 10 11.548 ±
0.133 ns/op
StringConstructor.copyOf 1 7 avgt 10 9.812 ±
0.018 ns/op
StringConstructor.optimizedArrayCopy 0 7 avgt 10 6.854 ±
0.355 ns/op <---- THAT'S COOL
StringConstructor.optimizedArrayCopy 1 7 avgt 10 9.088 ±
0.049 ns/op
the optimized array copy is helping C2 on stub generation.
I didn't checked yet if this applies to the `String` case and I didn't created
a long enough dataset array to check the effects on the branch predictor with
the newly introduced conditions too, but in term of generated stub, there's a
difference.
-------------
PR: https://git.openjdk.org/jdk/pull/12453