On Tue, 18 Jan 2022 10:08:35 GMT, Claes Redestad <redes...@openjdk.org> wrote:
> This resolves minor inefficiency in the fast-path for decoding latin-1 chars > from UTF-8. I also took the opportunity to refactor the StringDecode > microbenchmark to align with recent changes to the StringEncode micro. > > The inefficiency is that this test is quite branchy: > > `if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) && ...` > > Since the two constant bytes differ only on the lowest bit this can be > transformed to this, saving us a branch: > > `if ((b1 & 0xfe) == 0xc2 && ...` > > This provides a small speed-up on microbenchmarks where the input can be > internally encoded as latin1: > > > Benchmark (charsetName) Mode Cnt Score Error Units > StringDecode.decodeLatin1LongStart UTF-8 avgt 50 2283.591 ± 12.332 ns/op > > StringDecode.decodeLatin1LongStart UTF-8 avgt 50 2165.984 ± 13.136 ns/op This pull request has now been integrated. Changeset: e314a4cf Author: Claes Redestad <redes...@openjdk.org> URL: https://git.openjdk.java.net/jdk/commit/e314a4cfda30cc680b3f0aef8c62b75ff81bdbb1 Stats: 139 lines in 2 files changed: 93 ins; 34 del; 12 mod 8280124: Reduce branches decoding latin-1 chars from UTF-8 encoded bytes Reviewed-by: rriggs, alanb, naoto ------------- PR: https://git.openjdk.java.net/jdk/pull/7122