On Fri, 11 Sep 2020 23:04:01 GMT, Jason Tatton <github.com+70893615+jasontatton-...@openjdk.org> wrote:
>> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 >> byte encoded Strings). It is provided for >> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) >> intrinsic for StringUTF16. To incorporate it >> I had to make a small change to StringLatin1.java (refactor of functionality >> to intrisified private method) as well as >> code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this >> patch contains a change to hotspot and >> java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 >> >> Details of testing: >> ============ >> I have created a jtreg test >> “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new >> intrinsic. Note >> that, particularly for the x86 implementation of the intrinsic, the code >> path taken is dependent upon the length of the >> input String. Hence the test has been designed to cover all these cases. In >> summary they are: >> - A “short” string of < 16 characters. >> - A SIMD String of 16 – 31 characters. >> - A AVX2 SIMD String of 32 characters+. >> >> Hardware used for testing: >> ----------------------------- >> >> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) >> • Intel i7 processor (with AVX2 support). >> - AWS Graviton 2 (ARM 64 processor). >> >> I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64. >> >> Possible future enhancements: >> ==================== >> For the x86 implementation there may be two further improvements we can make >> in order to improve performance of both >> the StringUTF16 and StringLatin1 indexOf(char) intrinsics: >> 1. Make use of AVX-512 instructions. >> 2. For “short” Strings (see below), I think it may be possible to modify the >> existing algorithm to still use SSE SIMD >> instructions instead of a loop. >> Benchmark results: >> ============ >> **Without** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- >> |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ± 182.581 >> | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ± 435.933 | >> ns/op | >> >> >> **With** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- >> |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ± 407.716 >> | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ± 167.306 | >> ns/op | >> >> >> The objective of the patch is to bring the performance of StringLatin1 >> indexOf(char) in line with StringUTF16 >> indexOf(char) for x86 and ARM64. We can see above that this has been >> achieved. Similar results were obtained when >> running on ARM. > > Hi Andrew, > > The current indexOf char intrinsics for StringUTF16 and the new one here for > StringLatin1 both use the AVX2 – i.e. 256 > bit instructions, these are also affected by the frequency scaling which > affects the AVX-512 instructions you pointed > out. Of course in a world where all the work taking place is AVX instructions > this wouldn’t be an issue but in mixed > mode execution this is a problem. However, the compiler does have knowledge > of the capability of the CPU upon which > it’s optimizing code for and is able to decide whether to use AVX > instructions if they are supported by the CPU AND if > it wouldn’t be detrimental for performance. In fact, there is a flag which > one can use to interact with > this: -XX:UseAVX=version. This of course made testing this patch an > interesting experience as the AVX2 instructions > were not enabled on the Xeon processors which I had access to at AWS, but in > the end I was able to use an i7 on my > corporate macbook to validate the code. From: mlbridge[bot] > <notificati...@github.com> Sent: 11 September 2020 17:01 > To: openjdk/jdk <j...@noreply.github.com> Cc: Tatton, Jason > <jptat...@amazon.com>; Mention <ment...@noreply.github.com> > Subject: Re: [openjdk/jdk] 8173585: Intrinsify StringLatin1.indexOf(char) > (#71) > > > Mailing list message from Andrew Haley<mailto:a...@redhat.com> on > hotspot-dev<mailto:hotspot-...@openjdk.java.net>: > > On 11/09/2020 11:23, Jason Tatton wrote: > > For the x86 implementation there may be two further improvements we > can make in order to improve performance of both the StringUTF16 and > StringLatin1 indexOf(char) intrinsics: > > 1. Make use of AVX-512 instructions. > > Is this really a good idea? > > When the processor detects Intel AVX instructions, additional > voltage is applied to the core. With the additional voltage applied, > the processor could run hotter, requiring the operating frequency to > be reduced to maintain operations within the TDP limits. The higher > voltage is maintained for 1 millisecond after the last Intel AVX > instruction completes, and then the voltage returns to the nominal > TDP voltage level. > > https://computing.llnl.gov/tutorials/linux_clusters/intelAVXperformanceWhitePaper.pdf > > So, if StringLatin1.indexOf(char) executes enough to make a difference > to any real-world program, the effect may well be to slow down the clock > for all of the code that does not use AVX. > > 2. For ?short? Strings (see below), I think it may be possible to modify the > existing algorithm to still use SSE SIMD > instructions instead of a loop. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. <https://www.redhat.com> > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on > GitHub<https://github.com/openjdk/jdk/pull/71#issuecomment-691179797>, or > unsubscribe<https://github.com/notifications/unsubscribe-auth/AQ44AL33DFEQ36TCHSTZKCLSFJCUJANCNFSM4Q74AKCQ>. Hi everyone, This patch is just missing a couple of reviewers... Please can someone step forward? I think it's a fairly straightforward change. -Jason ------------- PR: https://git.openjdk.java.net/jdk/pull/71