Re: RFR: 8280124: Reduce branches decoding latin-1 chars from UTF-8 encoded bytes

Roger Riggs Tue, 18 Jan 2022 07:39:56 -0800

On Tue, 18 Jan 2022 10:08:35 GMT, Claes Redestad <[email protected]> wrote:


> This resolves minor inefficiency in the fast-path for decoding latin-1 chars 
> from UTF-8. I also took the opportunity to refactor the StringDecode 
> microbenchmark to align with recent changes to the StringEncode micro.
> 
> The inefficiency is that this test is quite branchy:
> 
> `if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) && ...`
> 
> Since the two constant bytes differ only on the lowest bit this can be 
> transformed to this, saving us a branch:
> 
> `if ((b1 & 0xfe) == 0xc2 && ...`
> 
> This provides a small speed-up on microbenchmarks where the input can be 
> internally encoded as latin1:
> 
> 
> Benchmark (charsetName) Mode Cnt Score Error Units
> StringDecode.decodeLatin1LongStart UTF-8 avgt 50 2283.591 ± 12.332 ns/op
> 
> StringDecode.decodeLatin1LongStart UTF-8 avgt 50 2165.984 ± 13.136 ns/op

LGTM

-------------

Marked as reviewed by rriggs (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7122

Re: RFR: 8280124: Reduce branches decoding latin-1 chars from UTF-8 encoded bytes

Reply via email to