On Fri, 1 Aug 2025 12:34:15 GMT, Brett Okken <d...@openjdk.org> wrote:
> As suggested on mailing list, when encoding latin1 bytes to utf-8, we can > count the leading positive bytes and in the case where there is a negative, > we can copy all the positive values to the target byte[] prior to processing > the remaining data 1 byte at a time. > > https://mail.openjdk.org/pipermail/core-libs-dev/2025-July/149417.html Benchmark on win64 Baseline: Benchmark (charsetName) Mode Cnt Score Error Units StringEncode.encodeAllMixed UTF-8 avgt 10 20067.519 ┬▒ 528.152 ns/op StringEncode.encodeAsciiLong UTF-8 avgt 10 12115.389 ┬▒ 307.491 ns/op StringEncode.encodeAsciiShort UTF-8 avgt 10 70.098 ┬▒ 1.696 ns/op StringEncode.encodeLatin1LongEnd UTF-8 avgt 10 1974.391 ┬▒ 162.405 ns/op StringEncode.encodeLatin1LongOnly UTF-8 avgt 10 270.097 ┬▒ 13.840 ns/op StringEncode.encodeLatin1LongStart UTF-8 avgt 10 1876.366 ┬▒ 51.971 ns/op StringEncode.encodeLatin1Mixed UTF-8 avgt 10 4973.070 ┬▒ 130.426 ns/op StringEncode.encodeLatin1Short UTF-8 avgt 10 96.227 ┬▒ 2.816 ns/op StringEncode.encodeShortMixed UTF-8 avgt 10 360.586 ┬▒ 8.691 ns/op StringEncode.encodeUTF16LongEnd UTF-8 avgt 10 1534.748 ┬▒ 34.584 ns/op StringEncode.encodeUTF16LongOnly UTF-8 avgt 10 528.919 ┬▒ 15.143 ns/op StringEncode.encodeUTF16LongStart UTF-8 avgt 10 2275.117 ┬▒ 50.152 ns/op StringEncode.encodeUTF16Mixed UTF-8 avgt 10 4398.943 ┬▒ 116.607 ns/op StringEncode.encodeUTF16Short UTF-8 avgt 10 152.219 ┬▒ 8.677 ns/op Patch: Benchmark (charsetName) Mode Cnt Score Error Units StringEncode.encodeAllMixed UTF-8 avgt 10 18876.056 ┬▒ 330.644 ns/op StringEncode.encodeAsciiLong UTF-8 avgt 10 12040.590 ┬▒ 165.905 ns/op StringEncode.encodeAsciiShort UTF-8 avgt 10 69.895 ┬▒ 0.318 ns/op StringEncode.encodeLatin1LongEnd UTF-8 avgt 10 574.455 ┬▒ 14.769 ns/op StringEncode.encodeLatin1LongOnly UTF-8 avgt 10 284.553 ┬▒ 1.886 ns/op StringEncode.encodeLatin1LongStart UTF-8 avgt 10 2230.789 ┬▒ 11.043 ns/op StringEncode.encodeLatin1Mixed UTF-8 avgt 10 3278.998 ┬▒ 96.779 ns/op StringEncode.encodeLatin1Short UTF-8 avgt 10 99.332 ┬▒ 1.977 ns/op StringEncode.encodeShortMixed UTF-8 avgt 10 378.183 ┬▒ 17.504 ns/op StringEncode.encodeUTF16LongEnd UTF-8 avgt 10 1531.960 ┬▒ 19.300 ns/op StringEncode.encodeUTF16LongOnly UTF-8 avgt 10 563.810 ┬▒ 4.811 ns/op StringEncode.encodeUTF16LongStart UTF-8 avgt 10 2270.970 ┬▒ 28.495 ns/op StringEncode.encodeUTF16Mixed UTF-8 avgt 10 4403.824 ┬▒ 60.338 ns/op StringEncode.encodeUTF16Short UTF-8 avgt 10 158.600 ┬▒ 2.044 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/26597#issuecomment-3144446972