On Wed, 1 Feb 2023 19:07:17 GMT, Scott Gibbons <d...@openjdk.org> wrote:
>> Added code for Base64 acceleration (encode and decode) which will accelerate >> ~4x for AVX2 platforms. >> >> Encode performance: >> **Old:** >> >> Benchmark (maxNumBytes) Mode Cnt Score Error >> Units >> Base64Encode.testBase64Encode 1024 thrpt 3 4309.439 ± 2.632 >> ops/ms >> >> >> **New:** >> >> Benchmark (maxNumBytes) Mode Cnt Score >> Error Units >> Base64Encode.testBase64Encode 1024 thrpt 3 24211.397 ± >> 102.026 ops/ms >> >> >> Decode performance: >> **Old:** >> >> Benchmark (errorIndex) (lineSize) (maxNumBytes) >> Mode Cnt Score Error Units >> Base64Decode.testBase64Decode 144 4 1024 >> thrpt 3 3961.768 ± 93.409 ops/ms >> >> **New:** >> Benchmark (errorIndex) (lineSize) (maxNumBytes) >> Mode Cnt Score Error Units >> Base64Decode.testBase64Decode 144 4 1024 >> thrpt 3 14738.051 ± 24.383 ops/ms > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Change break-even buffer size for AVX512 src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2693: > 2691: __ vpshufb(xmm0, xmm0, xmm13, Assembler::AVX_256bit); > 2692: __ vpermd(xmm0, xmm12, xmm0, Assembler::AVX_256bit); > 2693: __ subl(length, 0x20); Subtraction effects EFLAGs we can save one redundant compare per iteration on [#L2697](https://github.com/openjdk/jdk/pull/12126/files#diff-b938ab8a7bd9f57eb02271e2dd24a305bca30f06e9f8b028e18a139c4908ec92R2697) by doing a prior subtraction by 0x2c (44) in pre-loop and increment by same amount post loop. Same goes out for encode main loop also. ------------- PR: https://git.openjdk.org/jdk/pull/12126