klion26 commented on PR #9689: URL: https://github.com/apache/arrow-rs/pull/9689#issuecomment-4396637437
@scovich @alamb I run the benchmark on ec2(Graviton, instance/cpu info attached below), and the regression exist, I've also tried the following, did not find any useful information yet - compare the result of `perf stat -e cycles,instructions,cache-references,cache-misses,branch-misses,cpu-clock,task-clock`(details attached below) - compare the result of `valgrind --tool=callgrind --branch-sim=yes` (details attached below) - change `#[inline(always)]` to `#[inline]` The result of `critcmp` is (left is from main branch, right from current branch) <img width="1260" height="334" alt="3a9451e4c795a8d8349f85d53b52b5a9" src="https://github.com/user-attachments/assets/f54fbb8c-e5bd-4acd-9d0e-8b3adfea69ac" /> > Instance type: t4g.small > uname: Linux ip-172-31-35-37.x.y.z.a.amzn2023.aarch64 #1 SMP Fri May 1 14:08:03 UTC 2026 aarch64 aarch64 aarch64 GNU/Linux <details> <summary>valgrind compare</summary> <p> current branch ``` ==48976== Callgrind, a call-graph generating cache profiler ==48976== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al. ==48976== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info ==48976== Command: ./variant_cast_cast_kernels-589283a85a93d289 --bench cast\ decimal32\ to\ uint8 ==48976== ==48976== For interactive control, run 'callgrind_control -h'. ==48977== ==48977== Events : Ir Bc Bcm Bi Bim ==48977== Collected : 663142 107976 5969 392 109 ==48977== ==48977== I refs: 663,142 ==48977== ==48977== Branches: 108,368 (107,976 cond + 392 ind) ==48977== Mispredicts: 6,078 ( 5,969 cond + 109 ind) ==48977== Mispred rate: 5.6% ( 5.5% + 27.8% ) "cast decimal32 to uint8" time: [7.3337 ms 7.3430 ms 7.3539 ms] change: [+7692.6% +7730.7% +7769.2%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe ==48976== ==48976== Events : Ir Bc Bcm Bi Bim ==48976== Collected : 4275596549 515310914 27010015 3919351 14426 ==48976== ==48976== I refs: 4,275,596,549 ==48976== ==48976== Branches: 519,230,265 (515,310,914 cond + 3,919,351 ind) ==48976== Mispredicts: 27,024,441 ( 27,010,015 cond + 14,426 ind) ==48976== Mispred rate: 5.2% ( 5.2% + 0.4% ) ``` main-branch ``` ==48981== Callgrind, a call-graph generating cache profiler ==48981== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al. ==48981== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info ==48981== Command: ./main_cast_kernels-589283a85a93d289 --bench cast\ decimal32\ to\ uint8 ==48981== ==48981== For interactive control, run 'callgrind_control -h'. ==48982== ==48982== Events : Ir Bc Bcm Bi Bim ==48982== Collected : 663159 107990 5961 392 109 ==48982== ==48982== I refs: 663,159 ==48982== ==48982== Branches: 108,382 (107,990 cond + 392 ind) ==48982== Mispredicts: 6,070 ( 5,961 cond + 109 ind) ==48982== Mispred rate: 5.6% ( 5.5% + 27.8% ) "cast decimal32 to uint8" time: [7.2994 ms 7.3039 ms 7.3086 ms] change: [−0.6951% −0.5323% −0.3939%] (p = 0.00 < 0.05) Change within noise threshold. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild ==48981== ==48981== Events : Ir Bc Bcm Bi Bim ==48981== Collected : 4275699913 515214573 26957819 3927787 16825 ==48981== ==48981== I refs: 4,275,699,913 ==48981== ==48981== Branches: 519,142,360 (515,214,573 cond + 3,927,787 ind) ==48981== Mispredicts: 26,974,644 ( 26,957,819 cond + 16,825 ind) ==48981== Mispred rate: 5.2% ( 5.2% + 0.4% ) ``` </p> </details> <details> <summary>perf -e result</summary> <p> current branch ``` Performance counter stats for './variant_cast_cast_kernels-589283a85a93d289 --bench cast decimal32 to uint8': 28594921663 cycles:u # 2.495 GHz <not supported> instructions:u <not supported> cache-references:u <not supported> cache-misses:u <not supported> branch-misses:u 11461.11 msec cpu-clock:u # 0.998 CPUs utilized 11461.12 msec task-clock:u # 0.998 CPUs utilized 11.479687283 seconds time elapsed 11.462746000 seconds user 0.000000000 seconds sys ``` main branch ``` [ec2-user@ip-172-31-35-37 ~]$ perf stat -e cycles,instructions,cache-references,cache-misses,branch-misses,cpu-clock,task-clock -- ./main_cast_kernels-589283a85a93d289 --bench "cast decimal32 to uint8" "cast decimal32 to uint8" time: [86.483 µs 86.859 µs 87.249 µs] change: [+1.3885% +1.9087% +2.4051%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild Performance counter stats for './main_cast_kernels-589283a85a93d289 --bench cast decimal32 to uint8': 29326652377 cycles:u # 2.424 GHz <not supported> instructions:u <not supported> cache-references:u <not supported> cache-misses:u <not supported> branch-misses:u 12096.47 msec cpu-clock:u # 1.000 CPUs utilized 12096.48 msec task-clock:u # 1.000 CPUs utilized 12.100896685 seconds time elapsed 12.087933000 seconds user 0.009998000 seconds sys ``` </p> </details> <details> <summary>cpu details(lscpu)</summary> ``` Architecture: aarch64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Vendor ID: ARM Model name: Neoverse-N1 Model: 1 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 Stepping: r3p1 BogoMIPS: 243.75 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp Caches (sum of all): L1d: 128 KiB (2 instances) L1i: 128 KiB (2 instances) L2: 2 MiB (2 instances) L3: 32 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0,1 Vulnerabilities: Gather data sampling: Not affected Indirect target selection: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Reg file data sampling: Not affected Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; __user pointer sanitization Spectre v2: Mitigation; CSV2, BHB Srbds: Not affected Tsa: Not affected Tsx async abort: Not affected Vmscape: Not affected ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
