On Fri, 17 Dec 2021 22:33:30 GMT, Dean Long <dl...@openjdk.org> wrote:
>> Originally this was spotted by by Amir Hadadi in >> https://stackoverflow.com/questions/70272651/missing-bounds-checking-elimination-in-string-constructor >> >> It looks like in the following code in `String(byte[], int, int, Charset)` >> >> while (offset < sl) { >> int b1 = bytes[offset]; >> if (b1 >= 0) { >> dst[dp++] = (byte)b1; >> offset++; // <--- >> continue; >> } >> if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) && >> offset + 1 < sl) { >> int b2 = bytes[offset + 1]; >> if (!isNotContinuation(b2)) { >> dst[dp++] = (byte)decode2(b1, b2); >> offset += 2; >> continue; >> } >> } >> // anything not a latin1, including the repl >> // we have to go with the utf16 >> break; >> } >> >> bounds check elimination is not executed when accessing byte array via >> `bytes[offset]. >> >> The reason, I guess, is that offset variable is modified within the loop >> (marked with arrow). >> >> Possible fix for this could be changing: >> >> `while (offset < sl)` ---> `while (offset >= 0 && offset < sl)` >> >> However the best is to invest in C2 optimization to handle all such cases. >> >> The following benchmark demonstrates good improvement: >> >> @State(Scope.Thread) >> @BenchmarkMode(Mode.AverageTime) >> @OutputTimeUnit(TimeUnit.NANOSECONDS) >> public class StringConstructorBenchmark { >> private byte[] array; >> private String str; >> >> @Setup >> public void setup() { >> str = "Quizdeltagerne spiste jordbær med fløde, mens cirkusklovnen. >> Я";//Latin1 ending with Russian >> array = str.getBytes(StandardCharsets.UTF_8); >> } >> >> @Benchmark >> public String newString() { >> return new String(array, 0, array.length, StandardCharsets.UTF_8); >> } >> >> @Benchmark >> public String translateEscapes() { >> return str.translateEscapes(); >> } >> } >> >> Results: >> >> //baseline >> Benchmark Mode Cnt Score Error Units >> StringConstructorBenchmark.newString avgt 50 173,092 ± 3,048 ns/op >> >> //patched >> Benchmark Mode Cnt Score Error Units >> StringConstructorBenchmark.newString avgt 50 126,908 ± 2,355 ns/op >> >> The same is observed in String.translateEscapes() for the same String as in >> the benchmark above: >> >> //baseline >> Benchmark Mode Cnt Score Error Units >> StringConstructorBenchmark.translateEscapes avgt 100 53,627 ± 0,850 ns/op >> >> //patched >> Benchmark Mode Cnt Score Error Units >> StringConstructorBenchmark.translateEscapes avgt 100 48,087 ± 1,129 ns/op >> >> Also I've looked into this with `LinuxPerfAsmProfiler`, full output for >> baseline is available here >> https://gist.github.com/stsypanov/d2524f98477d633fb1d4a2510fedeea6 and for >> patched code here >> https://gist.github.com/stsypanov/16c787e4f9fa3dd122522f16331b68b7. >> >> Here's the part of baseline assembly responsible for `while` loop: >> >> 3.62% ││ │ 0x00007fed70eb4c1c: mov %ebx,%ecx >> 2.29% ││ │ 0x00007fed70eb4c1e: mov %edx,%r9d >> 2.22% ││ │ 0x00007fed70eb4c21: mov (%rsp),%r8 >> ;*iload_2 {reexecute=0 rethrow=0 return_oop=0} >> ││ │ >> ; - java.lang.String::<init>@107 (line 537) >> 2.32% ↘│ │ 0x00007fed70eb4c25: cmp %r13d,%ecx >> │ │ 0x00007fed70eb4c28: jge >> 0x00007fed70eb5388 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} >> │ │ >> ; - java.lang.String::<init>@110 (line 537) >> 3.05% │ │ 0x00007fed70eb4c2e: cmp >> 0x8(%rsp),%ecx >> │ │ 0x00007fed70eb4c32: jae >> 0x00007fed70eb5319 >> 2.38% │ │ 0x00007fed70eb4c38: mov %r8,(%rsp) >> 2.64% │ │ 0x00007fed70eb4c3c: movslq %ecx,%r8 >> 2.46% │ │ 0x00007fed70eb4c3f: mov %rax,%rbx >> 3.44% │ │ 0x00007fed70eb4c42: sub %r8,%rbx >> 2.62% │ │ 0x00007fed70eb4c45: add $0x1,%rbx >> 2.64% │ │ 0x00007fed70eb4c49: and >> $0xfffffffffffffffe,%rbx >> 2.30% │ │ 0x00007fed70eb4c4d: mov %ebx,%r8d >> 3.08% │ │ 0x00007fed70eb4c50: add %ecx,%r8d >> 2.55% │ │ 0x00007fed70eb4c53: movslq %r8d,%r8 >> 2.45% │ │ 0x00007fed70eb4c56: add >> $0xfffffffffffffffe,%r8 >> 2.13% │ │ 0x00007fed70eb4c5a: cmp (%rsp),%r8 >> │ │ 0x00007fed70eb4c5e: jae >> 0x00007fed70eb5319 >> 3.36% │ │ 0x00007fed70eb4c64: mov %ecx,%edi >> ;*aload_1 {reexecute=0 rethrow=0 return_oop=0} >> │ │ >> ; - java.lang.String::<init>@113 (line 538) >> 2.86% │ ↗│ 0x00007fed70eb4c66: movsbl >> 0x10(%r14,%rdi,1),%r8d ;*baload {reexecute=0 rethrow=0 return_oop=0} >> │ ││ >> ; - java.lang.String::<init>@115 (line 538) >> 2.48% │ ││ 0x00007fed70eb4c6c: mov %r9d,%edx >> 2.26% │ ││ 0x00007fed70eb4c6f: inc %edx >> ;*iinc {reexecute=0 rethrow=0 return_oop=0} >> │ ││ >> ; - java.lang.String::<init>@127 (line 540) >> 3.28% │ ││ 0x00007fed70eb4c71: mov %edi,%ebx >> 2.44% │ ││ 0x00007fed70eb4c73: inc %ebx >> ;*iinc {reexecute=0 rethrow=0 return_oop=0} >> │ ││ >> ; - java.lang.String::<init>@134 (line 541) >> 2.35% │ ││ 0x00007fed70eb4c75: test %r8d,%r8d >> ╰ ││ 0x00007fed70eb4c78: jge >> 0x00007fed70eb4c04 ;*iflt {reexecute=0 rethrow=0 return_oop=0} >> ││ >> ; - java.lang.String::<init>@120 (line 539) >> >> and this one is for patched code: >> >> 17.28% ││ 0x00007f6b88eb6061: mov %edx,%r10d >> ;*iload_2 {reexecute=0 rethrow=0 return_oop=0} >> ││ >> ; - java.lang.String::<init>@107 (line 537) >> 0.11% ↘│ 0x00007f6b88eb6064: test %r10d,%r10d >> │ 0x00007f6b88eb6067: jl 0x00007f6b88eb669c >> ;*iflt {reexecute=0 rethrow=0 return_oop=0} >> │ >> ; - java.lang.String::<init>@108 (line 537) >> 0.39% │ 0x00007f6b88eb606d: cmp %r13d,%r10d >> │ 0x00007f6b88eb6070: jge 0x00007f6b88eb66d0 >> ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} >> │ >> ; - java.lang.String::<init>@114 (line 537) >> 0.66% │ 0x00007f6b88eb6076: mov %ebx,%r9d >> 13.70% │ 0x00007f6b88eb6079: cmp 0x8(%rsp),%r10d >> 0.01% │ 0x00007f6b88eb607e: jae 0x00007f6b88eb6671 >> 0.14% │ 0x00007f6b88eb6084: movsbl 0x10(%r14,%r10,1),%edi >> ;*baload {reexecute=0 rethrow=0 return_oop=0} >> │ >> ; - java.lang.String::<init>@119 (line 538) >> 0.37% │ 0x00007f6b88eb608a: mov %r9d,%ebx >> 0.99% │ 0x00007f6b88eb608d: inc %ebx >> ;*iinc {reexecute=0 rethrow=0 return_oop=0} >> │ >> ; - java.lang.String::<init>@131 (line 540) >> 12.88% │ 0x00007f6b88eb608f: movslq %r9d,%rsi >> ;*bastore {reexecute=0 rethrow=0 return_oop=0} >> │ >> ; - java.lang.String::<init>@196 (line 548) >> 0.17% │ 0x00007f6b88eb6092: mov %r10d,%edx >> 0.39% │ 0x00007f6b88eb6095: inc %edx >> ;*iinc {reexecute=0 rethrow=0 return_oop=0} >> │ >> ; - java.lang.String::<init>@138 (line 541) >> 0.96% │ 0x00007f6b88eb6097: test %edi,%edi >> 0.02% │ 0x00007f6b88eb6099: jl 0x00007f6b88eb60dc >> ;*iflt {reexecute=0 rethrow=0 return_oop=0} >> >> Between instructions mapped to `if_icmpge` and `aload_1` in baseline we have >> bounds check which is missing from patched code. > > This does look like a HotSpot JIT compiler issue to me. My guess is that it > is related to how checkBoundsOffCount() checks for offset < 0: > > 396 if ((length | fromIndex | size) < 0 || size > length - fromIndex) > > using | to combine three values. @dean-long actually the issue reproduces with Java 17 where `checkBoundsOffCount` was implemented in a more straight forward way: static void checkBoundsOffCount(int offset, int count, int length) { if (offset < 0 || count < 0 || offset > length - count) { throw new StringIndexOutOfBoundsException( "offset " + offset + ", count " + count + ", length " + length); } } Here's a [gist](https://gist.github.com/amirhadadi/9505c3f5d9ad68cad2fbfd1b9e01f0b8) with a benchmark you can run. In this benchmark I'm comparing safe and unsafe reads from the byte array (I didn't modify the code to add the offset >= 0 condition). Here are the results: Benchmark Mode Cnt Score Error Units StringBenchmark.safeDecoding avgt 20 120.312 ± 11.674 ns/op StringBenchmark.unsafeDecoding avgt 20 72.628 ± 0.479 ns/op ------------- PR: https://git.openjdk.java.net/jdk/pull/6812