On Wed, 29 Oct 2025 20:41:06 GMT, Roger Riggs <[email protected]> wrote:
>> ### Long: packing 1:M-count + 1-3 folding codepoints >> >> https://cr.openjdk.org/~sherman/casefolding_long/ >> >> The performance is slightly better, but not as good as I would have >> expected. The access to codepoint from the long looks a little clumsy, but >> the logic looks smooth. need more work. opinion? >> >> >> Benchmark Mode Cnt Score Error Units >> StringCompareToFoldCase.asciiLower avgt 15 15.487 ± 0.298 ns/op >> StringCompareToFoldCase.asciiLowerEQ avgt 15 10.005 ± 0.368 ns/op >> StringCompareToFoldCase.asciiLowerEQFC avgt 15 10.755 ± 0.160 ns/op >> StringCompareToFoldCase.asciiLowerFC avgt 15 10.349 ± 0.155 ns/op >> StringCompareToFoldCase.asciiUpperLower avgt 15 12.188 ± 0.278 ns/op >> StringCompareToFoldCase.asciiUpperLowerEQ avgt 15 10.901 ± 0.551 ns/op >> StringCompareToFoldCase.asciiUpperLowerEQFC avgt 15 9.218 ± 0.165 ns/op >> StringCompareToFoldCase.asciiUpperLowerFC avgt 15 9.335 ± 0.404 ns/op >> StringCompareToFoldCase.asciiWithDFFC avgt 15 37.010 ± 0.518 ns/op >> StringCompareToFoldCase.greekLower avgt 15 39.572 ± 0.098 ns/op >> StringCompareToFoldCase.greekLowerEQ avgt 15 39.317 ± 0.104 ns/op >> StringCompareToFoldCase.greekLowerEQFC avgt 15 20.428 ± 0.243 ns/op >> StringCompareToFoldCase.greekLowerFC avgt 15 19.623 ± 0.141 ns/op >> StringCompareToFoldCase.greekUpperLower avgt 15 7.105 ± 0.048 ns/op >> StringCompareToFoldCase.greekUpperLowerEQ avgt 15 7.462 ± 0.092 ns/op >> StringCompareToFoldCase.greekUpperLowerEQFC avgt 15 6.518 ± 0.128 ns/op >> StringCompareToFoldCase.greekUpperLowerFC avgt 15 6.593 ± 0.240 ns/op >> StringCompareToFoldCase.latin1UTF16 avgt 15 23.130 ± 0.152 ns/op >> StringCompareToFoldCase.latin1UTF16EQ avgt 15 22.606 ± 0.089 ns/op >> StringCompareToFoldCase.latin1UTF16EQFC avgt 15 29.574 ± 0.348 ns/op >> StringCompareToFoldCase.latin1UTF16FC avgt 15 29.691 ± 0.445 ns/op >> StringCompareToFoldCase.supLower avgt 15 55.027 ± 0.676 ns/op >> StringCompareToFoldCase.supLowerEQ avgt 15 55.784 ± 0.368 ns/op >> StringCompareToFoldCase.supLowerEQFC avgt 15 24.984 ± 0.157 ns/op >> StringCompareToFoldCase.supLowerFC avgt 15 24.865 ± 0.139 ns/op >> StringCompareToFoldCase.supUpperLower avgt 15 14.538 ± 0.144 ns/op >> StringCompareToFoldCas... > >> Experimenting with Arrays.mismatch at the beginning of the array iteration as >> ... >> The benchmark results suggest that it does help 'dramatically' when the >> compared strings share with the same prefix. For example those "UpperLower" >> test cases (which shares the same upper cases text prefix. However it is >> also relatively expensive, with a 20%-ish overhead when the strings do not >> share the same string text but are case-insensitively equals. I would >> suggest let's leave it out for now? > >> ``` > Ok to leave it out for now. In similar contexts where System.arraycopy or > Arrays.mismatch has some overhead I've suggested doing a simple check (like > `size < 8`) to avoid the overhead when the strings/byte arrays are short. > Thanks for checking. > The performance is slightly better, but not as good as I would have expected. > The access to codepoint from the long looks a little clumsy, but the logic > looks smooth. need more work. opinion? It does look cleaner without the array indexing in the loops. Can the counting of characters (fcnt1,fcnt2) be eliminated by encoding 3 20-bit characters into the long and then checking `f1 != 0` to indicate there are more characters. Its a bit of an odd mix of 16-bit characters vs a single 20-bit char. Are there any 20-bit chars from or to folded replacements in the folding mappings? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27628#discussion_r2475481372
