On Fri, 25 Feb 2022 09:19:33 GMT, Claes Redestad <redes...@openjdk.org> wrote:

>> Splitting out these micro changes from #7231
>> 
>> - Clean up and simplify setup and code
>> - Add variants with different inputs with varying lengths and encoding 
>> weights, but also relevant mixes of each so that we both cover interesting 
>> corner cases but also verify that performance behave when there's a 
>> multitude of input shapes. Both simple and mixed variants are interesting 
>> diagnostic tools.
>> - Drop all charsets from the default run configuration except UTF-8. 
>> Motivation: Defaults should give good coverage but try to keep runtimes at 
>> bay. Additionally if the charset tested can't encode the higher codepoints 
>> used in these micros the results might be misleading. If you for example 
>> test using ISO-8859-1 the UTF-16 bytes in StringDecode.decodeUTF16 will have 
>> all been replaced by `?`, so the test is effectively the same as testing 
>> ASCII-only.
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Brent review

test/micro/org/openjdk/bench/java/lang/StringDecode.java line 72:

> 70:     public void setup() {
> 71:         charset = Charset.forName(charsetName);
> 72:         asciiString = LOREM.substring(0, 32).getBytes(charset);

This is problematic IMO in that it's missing short strings such as "Claes". 
Average Java strings are about 32 bytes long AFAICR, and people writing 
(vectorized) ijntrinsics have a nasty habit of optimizing for long strings, to 
the detriment of typical-length ones.
Whether we like it or not, people will optimize for benchmarks, so it's 
important that benchmark data is realistic. The shortest here is 15 bytes, as 
far as I can see. I'd certainly include a short string of just a few bytes so 
that intrinsics don't cause regressions in important cases.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7516

Reply via email to