klion26 commented on PR #9689:
URL: https://github.com/apache/arrow-rs/pull/9689#issuecomment-4396637437

   @scovich @alamb I run the benchmark on ec2(Graviton, instance/cpu info 
attached below), and the regression exist, I've also tried the following, did 
not find any useful information yet 
   
   - compare the result of `perf stat -e 
cycles,instructions,cache-references,cache-misses,branch-misses,cpu-clock,task-clock`(details
 attached below)
   - compare the result of `valgrind --tool=callgrind --branch-sim=yes` 
(details attached below)
   - change `#[inline(always)]` to `#[inline]`
   
   The result of `critcmp` is (left is from main branch, right from current 
branch)
   
   <img width="1260" height="334" alt="3a9451e4c795a8d8349f85d53b52b5a9" 
src="https://github.com/user-attachments/assets/f54fbb8c-e5bd-4acd-9d0e-8b3adfea69ac";
 />
   
   
   > Instance type: t4g.small
   > uname: Linux ip-172-31-35-37.x.y.z.a.amzn2023.aarch64 #1 SMP Fri May  1 
14:08:03 UTC 2026 aarch64 aarch64 aarch64 GNU/Linux
   
   <details>
   <summary>valgrind compare</summary>
   
   <p>
   
   current branch
   
   ```
   ==48976== Callgrind, a call-graph generating cache profiler
   ==48976== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et 
al.
   ==48976== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
   ==48976== Command: ./variant_cast_cast_kernels-589283a85a93d289 --bench 
cast\ decimal32\ to\ uint8
   ==48976==
   ==48976== For interactive control, run 'callgrind_control -h'.
   ==48977==
   ==48977== Events    : Ir Bc Bcm Bi Bim
   ==48977== Collected : 663142 107976 5969 392 109
   ==48977==
   ==48977== I   refs:      663,142
   ==48977==
   ==48977== Branches:      108,368  (107,976 cond + 392 ind)
   ==48977== Mispredicts:     6,078  (  5,969 cond + 109 ind)
   ==48977== Mispred rate:      5.6% (    5.5%     + 27.8%   )
   "cast decimal32 to uint8"
                           time:   [7.3337 ms 7.3430 ms 7.3539 ms]
                           change: [+7692.6% +7730.7% +7769.2%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     1 (1.00%) high mild
     3 (3.00%) high severe
   
   ==48976==
   ==48976== Events    : Ir Bc Bcm Bi Bim
   ==48976== Collected : 4275596549 515310914 27010015 3919351 14426
   ==48976==
   ==48976== I   refs:      4,275,596,549
   ==48976==
   ==48976== Branches:        519,230,265  (515,310,914 cond + 3,919,351 ind)
   ==48976== Mispredicts:      27,024,441  ( 27,010,015 cond +    14,426 ind)
   ==48976== Mispred rate:            5.2% (        5.2%     +       0.4%   )
   ```
   
   main-branch
   ```
   ==48981== Callgrind, a call-graph generating cache profiler
   ==48981== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et 
al.
   ==48981== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
   ==48981== Command: ./main_cast_kernels-589283a85a93d289 --bench cast\ 
decimal32\ to\ uint8
   ==48981==
   ==48981== For interactive control, run 'callgrind_control -h'.
   ==48982==
   ==48982== Events    : Ir Bc Bcm Bi Bim
   ==48982== Collected : 663159 107990 5961 392 109
   ==48982==
   ==48982== I   refs:      663,159
   ==48982==
   ==48982== Branches:      108,382  (107,990 cond + 392 ind)
   ==48982== Mispredicts:     6,070  (  5,961 cond + 109 ind)
   ==48982== Mispred rate:      5.6% (    5.5%     + 27.8%   )
   "cast decimal32 to uint8"
                           time:   [7.2994 ms 7.3039 ms 7.3086 ms]
                           change: [−0.6951% −0.5323% −0.3939%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   ==48981==
   ==48981== Events    : Ir Bc Bcm Bi Bim
   ==48981== Collected : 4275699913 515214573 26957819 3927787 16825
   ==48981==
   ==48981== I   refs:      4,275,699,913
   ==48981==
   ==48981== Branches:        519,142,360  (515,214,573 cond + 3,927,787 ind)
   ==48981== Mispredicts:      26,974,644  ( 26,957,819 cond +    16,825 ind)
   ==48981== Mispred rate:            5.2% (        5.2%     +       0.4%   )
   ```
   
   </p>
   </details>
   
   
   <details>
   <summary>perf -e result</summary>
   
   <p>
   
   current branch
   ```
   Performance counter stats for './variant_cast_cast_kernels-589283a85a93d289 
--bench cast decimal32 to uint8':
   
          28594921663      cycles:u                         #    2.495 GHz
      <not supported>      instructions:u
      <not supported>      cache-references:u
      <not supported>      cache-misses:u
      <not supported>      branch-misses:u
             11461.11 msec cpu-clock:u                      #    0.998 CPUs 
utilized
             11461.12 msec task-clock:u                     #    0.998 CPUs 
utilized
   
         11.479687283 seconds time elapsed
   
         11.462746000 seconds user
          0.000000000 seconds sys
   
   
   ```
   
   main branch
   ```
   [ec2-user@ip-172-31-35-37 ~]$ perf stat -e 
cycles,instructions,cache-references,cache-misses,branch-misses,cpu-clock,task-clock
 -- ./main_cast_kernels-589283a85a93d289 --bench "cast decimal32 to uint8"
   "cast decimal32 to uint8"
                           time:   [86.483 µs 86.859 µs 87.249 µs]
                           change: [+1.3885% +1.9087% +2.4051%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   
    Performance counter stats for './main_cast_kernels-589283a85a93d289 --bench 
cast decimal32 to uint8':
   
          29326652377      cycles:u                         #    2.424 GHz
      <not supported>      instructions:u
      <not supported>      cache-references:u
      <not supported>      cache-misses:u
      <not supported>      branch-misses:u
             12096.47 msec cpu-clock:u                      #    1.000 CPUs 
utilized
             12096.48 msec task-clock:u                     #    1.000 CPUs 
utilized
   
         12.100896685 seconds time elapsed
   
         12.087933000 seconds user
          0.009998000 seconds sys
   ```
   </p>
   
   </details>
   
   
   <details>
   <summary>cpu details(lscpu)</summary>
   
   
   ```
   Architecture:                aarch64
     CPU op-mode(s):            32-bit, 64-bit
     Byte Order:                Little Endian
   CPU(s):                      2
     On-line CPU(s) list:       0,1
   Vendor ID:                   ARM
     Model name:                Neoverse-N1
       Model:                   1
       Thread(s) per core:      1
       Core(s) per socket:      2
       Socket(s):               1
       Stepping:                r3p1
       BogoMIPS:                243.75
       Flags:                   fp asimd evtstrm aes pmull sha1 sha2 crc32 
atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
   Caches (sum of all):
     L1d:                       128 KiB (2 instances)
     L1i:                       128 KiB (2 instances)
     L2:                        2 MiB (2 instances)
     L3:                        32 MiB (1 instance)
   NUMA:
     NUMA node(s):              1
     NUMA node0 CPU(s):         0,1
   Vulnerabilities:
     Gather data sampling:      Not affected
     Indirect target selection: Not affected
     Itlb multihit:             Not affected
     L1tf:                      Not affected
     Mds:                       Not affected
     Meltdown:                  Not affected
     Mmio stale data:           Not affected
     Reg file data sampling:    Not affected
     Retbleed:                  Not affected
     Spec rstack overflow:      Not affected
     Spec store bypass:         Mitigation; Speculative Store Bypass disabled 
via prctl
     Spectre v1:                Mitigation; __user pointer sanitization
     Spectre v2:                Mitigation; CSV2, BHB
     Srbds:                     Not affected
     Tsa:                       Not affected
     Tsx async abort:           Not affected
     Vmscape:                   Not affected
   ```
   
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to