On Fri, 25 Feb 2022 09:19:33 GMT, Claes Redestad <redes...@openjdk.org> wrote:
>> Splitting out these micro changes from #7231 >> >> - Clean up and simplify setup and code >> - Add variants with different inputs with varying lengths and encoding >> weights, but also relevant mixes of each so that we both cover interesting >> corner cases but also verify that performance behave when there's a >> multitude of input shapes. Both simple and mixed variants are interesting >> diagnostic tools. >> - Drop all charsets from the default run configuration except UTF-8. >> Motivation: Defaults should give good coverage but try to keep runtimes at >> bay. Additionally if the charset tested can't encode the higher codepoints >> used in these micros the results might be misleading. If you for example >> test using ISO-8859-1 the UTF-16 bytes in StringDecode.decodeUTF16 will have >> all been replaced by `?`, so the test is effectively the same as testing >> ASCII-only. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Brent review test/micro/org/openjdk/bench/java/lang/StringDecode.java line 72: > 70: public void setup() { > 71: charset = Charset.forName(charsetName); > 72: asciiString = LOREM.substring(0, 32).getBytes(charset); This is problematic IMO in that it's missing short strings such as "Claes". Average Java strings are about 32 bytes long AFAICR, and people writing (vectorized) ijntrinsics have a nasty habit of optimizing for long strings, to the detriment of typical-length ones. Whether we like it or not, people will optimize for benchmarks, so it's important that benchmark data is realistic. The shortest here is 15 bytes, as far as I can see. I'd certainly include a short string of just a few bytes so that intrinsics don't cause regressions in important cases. ------------- PR: https://git.openjdk.java.net/jdk/pull/7516