cyb70289 commented on PR #40335:
URL: https://github.com/apache/arrow/pull/40335#issuecomment-1987511710

   Tested on Neoverse-N1. For clang, I see performance improvement from both 
encoder and decode. But for gcc, there's some drop from the encoder.
   
   **- clang-16, improvement from both encoder and decoder**
   ```
   decode (improve)
   ----------------
   BM_ByteStreamSplitDecode_Float_Scalar/1024         1167 ns         1167 ns   
    600395 bytes_per_second=3.27015Gi/s
   BM_ByteStreamSplitDecode_Float_Scalar/4096         4648 ns         4648 ns   
    150615 bytes_per_second=3.28313Gi/s
   BM_ByteStreamSplitDecode_Float_Scalar/32768       38248 ns        38247 ns   
     18300 bytes_per_second=3.19159Gi/s
   BM_ByteStreamSplitDecode_Float_Scalar/65536       76448 ns        76446 ns   
      9159 bytes_per_second=3.19363Gi/s
   BM_ByteStreamSplitDecode_Double_Scalar/1024        2814 ns         2814 ns   
    248735 bytes_per_second=2.71086Gi/s
   BM_ByteStreamSplitDecode_Double_Scalar/4096       11236 ns        11236 ns   
     62307 bytes_per_second=2.7161Gi/s
   BM_ByteStreamSplitDecode_Double_Scalar/32768      92623 ns        92616 ns   
      7551 bytes_per_second=2.63604Gi/s
   BM_ByteStreamSplitDecode_Double_Scalar/65536     188190 ns       188185 ns   
      3728 bytes_per_second=2.59469Gi/s
   
   BM_ByteStreamSplitDecode_Float_Neon/1024            817 ns          817 ns   
    856316 bytes_per_second=4.66674Gi/s
   BM_ByteStreamSplitDecode_Float_Neon/4096           3240 ns         3240 ns   
    216075 bytes_per_second=4.71005Gi/s
   BM_ByteStreamSplitDecode_Float_Neon/32768         26981 ns        26981 ns   
     25942 bytes_per_second=4.52429Gi/s
   BM_ByteStreamSplitDecode_Float_Neon/65536         54189 ns        54186 ns   
     12924 bytes_per_second=4.50564Gi/s
   BM_ByteStreamSplitDecode_Double_Neon/1024          1767 ns         1767 ns   
    396110 bytes_per_second=4.31715Gi/s
   BM_ByteStreamSplitDecode_Double_Neon/4096          7138 ns         7137 ns   
     98106 bytes_per_second=4.27568Gi/s
   BM_ByteStreamSplitDecode_Double_Neon/32768        64999 ns        64997 ns   
     10779 bytes_per_second=3.75616Gi/s
   BM_ByteStreamSplitDecode_Double_Neon/65536       130243 ns       130243 ns   
      5366 bytes_per_second=3.74901Gi/s
   
   encode (improve)
   ----------------
   BM_ByteStreamSplitEncode_Float_Scalar/1024         1482 ns         1482 ns   
    472507 bytes_per_second=2.57419Gi/s
   BM_ByteStreamSplitEncode_Float_Scalar/4096         5897 ns         5897 ns   
    118700 bytes_per_second=2.58776Gi/s
   BM_ByteStreamSplitEncode_Float_Scalar/32768       47959 ns        47956 ns   
     14597 bytes_per_second=2.54548Gi/s
   BM_ByteStreamSplitEncode_Float_Scalar/65536       95903 ns        95896 ns   
      7298 bytes_per_second=2.54588Gi/s
   BM_ByteStreamSplitEncode_Double_Scalar/1024        2950 ns         2950 ns   
    237274 bytes_per_second=2.58627Gi/s
   BM_ByteStreamSplitEncode_Double_Scalar/4096       11786 ns        11786 ns   
     59393 bytes_per_second=2.58938Gi/s
   BM_ByteStreamSplitEncode_Double_Scalar/32768      98141 ns        98138 ns   
      7133 bytes_per_second=2.48773Gi/s
   BM_ByteStreamSplitEncode_Double_Scalar/65536     198219 ns       198203 ns   
      3531 bytes_per_second=2.46354Gi/s
   
   BM_ByteStreamSplitEncode_Float_Neon/1024           1152 ns         1152 ns   
    607844 bytes_per_second=3.31275Gi/s
   BM_ByteStreamSplitEncode_Float_Neon/4096           4571 ns         4570 ns   
    153146 bytes_per_second=3.33858Gi/s
   BM_ByteStreamSplitEncode_Float_Neon/32768         37086 ns        37084 ns   
     18873 bytes_per_second=3.29172Gi/s
   BM_ByteStreamSplitEncode_Float_Neon/65536         74336 ns        74336 ns   
      9417 bytes_per_second=3.2843Gi/s
   BM_ByteStreamSplitEncode_Double_Neon/1024          1978 ns         1978 ns   
    353156 bytes_per_second=3.85706Gi/s
   BM_ByteStreamSplitEncode_Double_Neon/4096          7947 ns         7947 ns   
     87879 bytes_per_second=3.84032Gi/s
   BM_ByteStreamSplitEncode_Double_Neon/32768        64458 ns        64458 ns   
     10863 bytes_per_second=3.7876Gi/s
   BM_ByteStreamSplitEncode_Double_Neon/65536       128693 ns       128689 ns   
      5440 bytes_per_second=3.79428Gi/s
   ```
   
   **- gcc-13, decoder improves, but encoder drops**
   ```
   decode (improve)
   ----------------
   BM_ByteStreamSplitDecode_Float_Scalar/1024         1133 ns         1133 ns   
    617695 bytes_per_second=3.3663Gi/s
   BM_ByteStreamSplitDecode_Float_Scalar/4096         4484 ns         4484 ns   
    156105 bytes_per_second=3.40284Gi/s
   BM_ByteStreamSplitDecode_Float_Scalar/32768       36318 ns        36318 ns   
     19273 bytes_per_second=3.36116Gi/s
   BM_ByteStreamSplitDecode_Float_Scalar/65536       73048 ns        73047 ns   
      9554 bytes_per_second=3.34225Gi/s
   BM_ByteStreamSplitDecode_Double_Scalar/1024        2814 ns         2814 ns   
    248738 bytes_per_second=2.7114Gi/s
   BM_ByteStreamSplitDecode_Double_Scalar/4096       11227 ns        11226 ns   
     62355 bytes_per_second=2.71838Gi/s
   BM_ByteStreamSplitDecode_Double_Scalar/32768      92482 ns        92478 ns   
      7552 bytes_per_second=2.64Gi/s
   BM_ByteStreamSplitDecode_Double_Scalar/65536     185853 ns       185844 ns   
      3748 bytes_per_second=2.62737Gi/s
   
   BM_ByteStreamSplitDecode_Float_Neon/1024            775 ns          775 ns   
    903307 bytes_per_second=4.92282Gi/s
   BM_ByteStreamSplitDecode_Float_Neon/4096           3061 ns         3061 ns   
    228720 bytes_per_second=4.98565Gi/s
   BM_ByteStreamSplitDecode_Float_Neon/32768         25543 ns        25542 ns   
     27405 bytes_per_second=4.77925Gi/s
   BM_ByteStreamSplitDecode_Float_Neon/65536         51478 ns        51474 ns   
     13609 bytes_per_second=4.74294Gi/s
   BM_ByteStreamSplitDecode_Double_Neon/1024          1626 ns         1626 ns   
    429095 bytes_per_second=4.69278Gi/s
   BM_ByteStreamSplitDecode_Double_Neon/4096          6485 ns         6485 ns   
    107513 bytes_per_second=4.70567Gi/s
   BM_ByteStreamSplitDecode_Double_Neon/32768        59680 ns        59680 ns   
     11757 bytes_per_second=4.09083Gi/s
   BM_ByteStreamSplitDecode_Double_Neon/65536       120697 ns       120688 ns   
      5594 bytes_per_second=4.04582Gi/s
   
   encode (drop)
   -------------
   BM_ByteStreamSplitEncode_Float_Scalar/1024         1142 ns         1142 ns   
    613228 bytes_per_second=3.34041Gi/s
   BM_ByteStreamSplitEncode_Float_Scalar/4096         4511 ns         4511 ns   
    155178 bytes_per_second=3.3825Gi/s
   BM_ByteStreamSplitEncode_Float_Scalar/32768       37560 ns        37560 ns   
     18636 bytes_per_second=3.25003Gi/s
   BM_ByteStreamSplitEncode_Float_Scalar/65536       75348 ns        75343 ns   
      9301 bytes_per_second=3.2404Gi/s
   BM_ByteStreamSplitEncode_Double_Scalar/1024        2201 ns         2201 ns   
    318028 bytes_per_second=3.46606Gi/s
   BM_ByteStreamSplitEncode_Double_Scalar/4096        8795 ns         8795 ns   
     79615 bytes_per_second=3.46994Gi/s
   BM_ByteStreamSplitEncode_Double_Scalar/32768      77388 ns        77383 ns   
      9045 bytes_per_second=3.15497Gi/s
   BM_ByteStreamSplitEncode_Double_Scalar/65536     153900 ns       153900 ns   
      4543 bytes_per_second=3.17272Gi/s
   
   BM_ByteStreamSplitEncode_Float_Neon/1024           1238 ns         1238 ns   
    565551 bytes_per_second=3.08201Gi/s
   BM_ByteStreamSplitEncode_Float_Neon/4096           4894 ns         4893 ns   
    143073 bytes_per_second=3.11821Gi/s
   BM_ByteStreamSplitEncode_Float_Neon/32768         39594 ns        39594 ns   
     17679 bytes_per_second=3.08304Gi/s
   BM_ByteStreamSplitEncode_Float_Neon/65536         79201 ns        79200 ns   
      8838 bytes_per_second=3.0826Gi/s
   BM_ByteStreamSplitEncode_Double_Neon/1024          2573 ns         2573 ns   
    272609 bytes_per_second=2.96532Gi/s
   BM_ByteStreamSplitEncode_Double_Neon/4096         10249 ns        10248 ns   
     68149 bytes_per_second=2.97782Gi/s
   BM_ByteStreamSplitEncode_Double_Neon/32768        88791 ns        88791 ns   
      7884 bytes_per_second=2.7496Gi/s
   BM_ByteStreamSplitEncode_Double_Neon/65536       176888 ns       176888 ns   
      3958 bytes_per_second=2.7604Gi/s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to