On Wed, 29 Jan 2025 16:36:24 GMT, Shaojin Wen <[email protected]> wrote:
> The byte[] allocated in Integer/Long.toString is fully filled, so we can use > Unsafe.allocateUninitializedArray to create byte[] to improve performance. This change demonstrates 2–23% speed improvements across multiple aarch64/x64 scenarios, but introduces ~18% regression in the Integers.toStringTiny benchmark on AMD EPYC™ Genoa processors. The regression is non-deterministic and not consistently reproducible. ## 1. Script git remote add wenshao [email protected]:wenshao/jdk.git git fetch wenshao #baseline git checkout f98d9a330128302207fb66dfa2555885ad93135f make test TEST="micro:java.lang.Longs.toString" make test TEST="micro:java.lang.Integers.toString" # current git checkout 2a06d12fcb7822395960c813d91a34eda0d661ce make test TEST="micro:java.lang.Longs.toString" make test TEST="micro:java.lang.Integers.toString" ## 2. MacBook M1 Pro (aarch64) -# baseline -Benchmark (size) Mode Cnt Score Error Units (f98d9a33012) -Longs.toStringBig 500 avgt 15 7.265 ? 0.063 us/op -Longs.toStringSmall 500 avgt 15 3.043 ? 0.051 us/op -Integers.toStringBig 500 avgt 15 4.837 ? 0.076 us/op -Integers.toStringSmall 500 avgt 15 2.922 ? 0.020 us/op -Integers.toStringTiny 500 avgt 15 2.136 ? 0.010 us/op +# current +Benchmark (size) Mode Cnt Score Error Units (2a06d12fcb7) +Longs.toStringBig 500 avgt 15 7.025 ? 0.024 us/op +Longs.toStringSmall 500 avgt 15 2.735 ? 0.008 us/op +Integers.toStringBig 500 avgt 15 4.592 ? 0.015 us/op +Integers.toStringSmall 500 avgt 15 2.632 ? 0.026 us/op +Integers.toStringTiny 500 avgt 15 1.734 ? 0.006 us/op | | pattern | baseline | current | delta | | --- | --- | --- | --- | --- | | Longs.toStringBig | 500 | 7.265 | 7.025 | 3.42% | | Longs.toStringSmall | 500 | 3.043 | 2.735 | 11.26% | | Integers.toStringBig | 500 | 4.837 | 4.592 | 5.34% | | Integers.toStringSmall | 500 | 2.922 | 2.632 | 11.02% | | Integers.toStringTiny | 500 | 2.136 | 1.734 | 23.18% | ## 3. aliyun_ecs_c8a_x64 (CPU AMD EPYC™ Genoa) +# baseline +Benchmark (size) Mode Cnt Score Error Units (f98d9a33012) +Longs.toStringBig 500 avgt 15 8.126 ± 0.027 us/op +Longs.toStringSmall 500 avgt 15 3.296 ± 0.029 us/op +Integers.toStringBig 500 avgt 15 4.957 ± 0.008 us/op +Integers.toStringSmall 500 avgt 15 3.467 ± 0.020 us/op +Integers.toStringTiny 500 avgt 15 2.534 ± 0.040 us/op -# current -Benchmark (size) Mode Cnt Score Error Units (2a06d12fcb7) -Longs.toStringBig 500 avgt 15 7.540 ± 0.019 us/op -Longs.toStringSmall 500 avgt 15 3.055 ± 0.006 us/op -Integers.toStringBig 500 avgt 15 4.646 ± 0.024 us/op -Integers.toStringSmall 500 avgt 15 3.173 ± 0.008 us/op -Integers.toStringTiny 500 avgt 15 3.118 ± 0.029 us/op | | pattern | baseline | current | delta | | --- | --- | --- | --- | --- | | Longs.toStringBig | 500 | 8.126 | 7.540 | 7.77% | | Longs.toStringSmall | 500 | 3.296 | 3.055 | 7.89% | | Integers.toStringBig | 500 | 4.957 | 4.646 | 6.69% | | Integers.toStringSmall | 500 | 3.467 | 3.173 | 9.27% | | Integers.toStringTiny | 500 | 2.534 | 3.118 | -18.73% | It is observed here that performance degradation begins at Warmup Iteration 3. # Warmup Iteration 1: 2.333 us/op # Warmup Iteration 2: 2.248 us/op # Warmup Iteration 3: 3.118 us/op # Warmup Iteration 4: 3.121 us/op # Warmup Iteration 5: 3.129 us/op # Warmup Iteration 6: 3.122 us/op # Warmup Iteration 7: 3.118 us/op # Warmup Iteration 8: 3.154 us/op # Warmup Iteration 9: 3.097 us/op # Warmup Iteration 10: 3.090 us/op Iteration 1: 3.090 us/op Iteration 2: 3.091 us/op Iteration 3: 3.092 us/op Iteration 4: 3.093 us/op Iteration 5: 3.098 us/op ## 4. aliyun_ecs_c8i_x64 (CPU Intel®Xeon®Emerald Rapids) +# baseline +Benchmark (size) Mode Cnt Score Error Units (f98d9a33012) +Longs.toStringBig 500 avgt 15 7.992 ± 0.039 us/op +Longs.toStringSmall 500 avgt 15 3.578 ± 0.022 us/op +Integers.toStringBig 500 avgt 15 5.536 ± 0.017 us/op +Integers.toStringSmall 500 avgt 15 3.657 ± 0.152 us/op +Integers.toStringTiny 500 avgt 15 2.638 ± 0.047 us/op -# current -Benchmark (size) Mode Cnt Score Error Units (2a06d12fcb7) -Longs.toStringBig 500 avgt 15 7.731 ± 0.011 us/op -Longs.toStringSmall 500 avgt 15 3.413 ± 0.020 us/op -Integers.toStringBig 500 avgt 15 4.738 ± 0.021 us/op -Integers.toStringSmall 500 avgt 15 3.184 ± 0.140 us/op -Integers.toStringTiny 500 avgt 15 2.621 ± 0.126 us/op | | pattern | baseline | current | delta | | --- | --- | --- | --- | --- | | Longs.toStringBig | 500 | 7.992 | 7.731 | 3.38% | | Longs.toStringSmall | 500 | 3.578 | 3.413 | 4.83% | | Integers.toStringBig | 500 | 5.536 | 4.738 | 16.84% | | Integers.toStringSmall | 500 | 3.657 | 3.184 | 14.86% | | Integers.toStringTiny | 500 | 2.638 | 2.621 | 0.65% | ## 5. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710) +# baseline +Benchmark (size) Mode Cnt Score Error Units (f98d9a33012) +Longs.toStringBig 500 avgt 15 11.017 ± 0.084 us/op +Longs.toStringSmall 500 avgt 15 4.400 ± 0.078 us/op +Integers.toStringBig 500 avgt 15 7.377 ± 0.103 us/op +Integers.toStringSmall 500 avgt 15 4.504 ± 0.083 us/op +Integers.toStringTiny 500 avgt 15 3.693 ± 0.107 us/op -# current -Benchmark (size) Mode Cnt Score Error Units (2a06d12fcb7) -Longs.toStringBig 500 avgt 15 10.696 ± 0.055 us/op -Longs.toStringSmall 500 avgt 15 4.111 ± 0.113 us/op -Integers.toStringBig 500 avgt 15 6.815 ± 0.097 us/op -Integers.toStringSmall 500 avgt 15 4.136 ± 0.103 us/op -Integers.toStringTiny 500 avgt 15 3.588 ± 0.102 us/op | | pattern | baseline | current | delta | | --- | --- | --- | --- | --- | | Longs.toStringBig | 500 | 11.017 | 10.696 | 3.00% | | Longs.toStringSmall | 500 | 4.400 | 4.111 | 7.03% | | Integers.toStringBig | 500 | 7.377 | 6.815 | 8.25% | | Integers.toStringSmall | 500 | 4.504 | 4.136 | 8.90% | | Integers.toStringTiny | 500 | 3.693 | 3.588 | 2.93% | ## 6. orange_pi5_aarch64 (CPU RK3588S) +# baseline +Benchmark (size) Mode Cnt Score Error Units (f98d9a33012) +Longs.toStringBig 500 avgt 15 23.235 ± 1.973 us/op +Longs.toStringSmall 500 avgt 15 8.262 ± 0.555 us/op +Integers.toStringBig 500 avgt 15 14.435 ± 0.819 us/op +Integers.toStringSmall 500 avgt 15 8.384 ± 0.669 us/op +Integers.toStringTiny 500 avgt 15 5.661 ± 0.404 us/op -# current -Benchmark (size) Mode Cnt Score Error Units (2a06d12fcb7) -Longs.toStringBig 500 avgt 15 21.727 ± 1.396 us/op -Longs.toStringSmall 500 avgt 15 7.591 ± 0.581 us/op -Integers.toStringBig 500 avgt 15 13.682 ± 0.930 us/op -Integers.toStringSmall 500 avgt 15 7.691 ± 0.575 us/op -Integers.toStringTiny 500 avgt 15 4.943 ± 0.473 us/op | | pattern | baseline | current | delta | | --- | --- | --- | --- | --- | | Longs.toStringBig | 500 | 23.235 | 21.727 | 6.94% | | Longs.toStringSmall | 500 | 8.262 | 7.591 | 8.84% | | Integers.toStringBig | 500 | 14.435 | 13.682 | 5.50% | | Integers.toStringSmall | 500 | 8.384 | 7.691 | 9.01% | | Integers.toStringTiny | 500 | 5.661 | 4.943 | 14.53% | ------------- PR Comment: https://git.openjdk.org/jdk/pull/23353#issuecomment-2623354805
