cyb70289 commented on PR #40335:
URL: https://github.com/apache/arrow/pull/40335#issuecomment-1987511710
Tested on Neoverse-N1. For clang, I see performance improvement from both
encoder and decode. But for gcc, there's some drop from the encoder.
**- clang-16, improvement from both encoder and decoder**
```
decode (improve)
----------------
BM_ByteStreamSplitDecode_Float_Scalar/1024 1167 ns 1167 ns
600395 bytes_per_second=3.27015Gi/s
BM_ByteStreamSplitDecode_Float_Scalar/4096 4648 ns 4648 ns
150615 bytes_per_second=3.28313Gi/s
BM_ByteStreamSplitDecode_Float_Scalar/32768 38248 ns 38247 ns
18300 bytes_per_second=3.19159Gi/s
BM_ByteStreamSplitDecode_Float_Scalar/65536 76448 ns 76446 ns
9159 bytes_per_second=3.19363Gi/s
BM_ByteStreamSplitDecode_Double_Scalar/1024 2814 ns 2814 ns
248735 bytes_per_second=2.71086Gi/s
BM_ByteStreamSplitDecode_Double_Scalar/4096 11236 ns 11236 ns
62307 bytes_per_second=2.7161Gi/s
BM_ByteStreamSplitDecode_Double_Scalar/32768 92623 ns 92616 ns
7551 bytes_per_second=2.63604Gi/s
BM_ByteStreamSplitDecode_Double_Scalar/65536 188190 ns 188185 ns
3728 bytes_per_second=2.59469Gi/s
BM_ByteStreamSplitDecode_Float_Neon/1024 817 ns 817 ns
856316 bytes_per_second=4.66674Gi/s
BM_ByteStreamSplitDecode_Float_Neon/4096 3240 ns 3240 ns
216075 bytes_per_second=4.71005Gi/s
BM_ByteStreamSplitDecode_Float_Neon/32768 26981 ns 26981 ns
25942 bytes_per_second=4.52429Gi/s
BM_ByteStreamSplitDecode_Float_Neon/65536 54189 ns 54186 ns
12924 bytes_per_second=4.50564Gi/s
BM_ByteStreamSplitDecode_Double_Neon/1024 1767 ns 1767 ns
396110 bytes_per_second=4.31715Gi/s
BM_ByteStreamSplitDecode_Double_Neon/4096 7138 ns 7137 ns
98106 bytes_per_second=4.27568Gi/s
BM_ByteStreamSplitDecode_Double_Neon/32768 64999 ns 64997 ns
10779 bytes_per_second=3.75616Gi/s
BM_ByteStreamSplitDecode_Double_Neon/65536 130243 ns 130243 ns
5366 bytes_per_second=3.74901Gi/s
encode (improve)
----------------
BM_ByteStreamSplitEncode_Float_Scalar/1024 1482 ns 1482 ns
472507 bytes_per_second=2.57419Gi/s
BM_ByteStreamSplitEncode_Float_Scalar/4096 5897 ns 5897 ns
118700 bytes_per_second=2.58776Gi/s
BM_ByteStreamSplitEncode_Float_Scalar/32768 47959 ns 47956 ns
14597 bytes_per_second=2.54548Gi/s
BM_ByteStreamSplitEncode_Float_Scalar/65536 95903 ns 95896 ns
7298 bytes_per_second=2.54588Gi/s
BM_ByteStreamSplitEncode_Double_Scalar/1024 2950 ns 2950 ns
237274 bytes_per_second=2.58627Gi/s
BM_ByteStreamSplitEncode_Double_Scalar/4096 11786 ns 11786 ns
59393 bytes_per_second=2.58938Gi/s
BM_ByteStreamSplitEncode_Double_Scalar/32768 98141 ns 98138 ns
7133 bytes_per_second=2.48773Gi/s
BM_ByteStreamSplitEncode_Double_Scalar/65536 198219 ns 198203 ns
3531 bytes_per_second=2.46354Gi/s
BM_ByteStreamSplitEncode_Float_Neon/1024 1152 ns 1152 ns
607844 bytes_per_second=3.31275Gi/s
BM_ByteStreamSplitEncode_Float_Neon/4096 4571 ns 4570 ns
153146 bytes_per_second=3.33858Gi/s
BM_ByteStreamSplitEncode_Float_Neon/32768 37086 ns 37084 ns
18873 bytes_per_second=3.29172Gi/s
BM_ByteStreamSplitEncode_Float_Neon/65536 74336 ns 74336 ns
9417 bytes_per_second=3.2843Gi/s
BM_ByteStreamSplitEncode_Double_Neon/1024 1978 ns 1978 ns
353156 bytes_per_second=3.85706Gi/s
BM_ByteStreamSplitEncode_Double_Neon/4096 7947 ns 7947 ns
87879 bytes_per_second=3.84032Gi/s
BM_ByteStreamSplitEncode_Double_Neon/32768 64458 ns 64458 ns
10863 bytes_per_second=3.7876Gi/s
BM_ByteStreamSplitEncode_Double_Neon/65536 128693 ns 128689 ns
5440 bytes_per_second=3.79428Gi/s
```
**- gcc-13, decoder improves, but encoder drops**
```
decode (improve)
----------------
BM_ByteStreamSplitDecode_Float_Scalar/1024 1133 ns 1133 ns
617695 bytes_per_second=3.3663Gi/s
BM_ByteStreamSplitDecode_Float_Scalar/4096 4484 ns 4484 ns
156105 bytes_per_second=3.40284Gi/s
BM_ByteStreamSplitDecode_Float_Scalar/32768 36318 ns 36318 ns
19273 bytes_per_second=3.36116Gi/s
BM_ByteStreamSplitDecode_Float_Scalar/65536 73048 ns 73047 ns
9554 bytes_per_second=3.34225Gi/s
BM_ByteStreamSplitDecode_Double_Scalar/1024 2814 ns 2814 ns
248738 bytes_per_second=2.7114Gi/s
BM_ByteStreamSplitDecode_Double_Scalar/4096 11227 ns 11226 ns
62355 bytes_per_second=2.71838Gi/s
BM_ByteStreamSplitDecode_Double_Scalar/32768 92482 ns 92478 ns
7552 bytes_per_second=2.64Gi/s
BM_ByteStreamSplitDecode_Double_Scalar/65536 185853 ns 185844 ns
3748 bytes_per_second=2.62737Gi/s
BM_ByteStreamSplitDecode_Float_Neon/1024 775 ns 775 ns
903307 bytes_per_second=4.92282Gi/s
BM_ByteStreamSplitDecode_Float_Neon/4096 3061 ns 3061 ns
228720 bytes_per_second=4.98565Gi/s
BM_ByteStreamSplitDecode_Float_Neon/32768 25543 ns 25542 ns
27405 bytes_per_second=4.77925Gi/s
BM_ByteStreamSplitDecode_Float_Neon/65536 51478 ns 51474 ns
13609 bytes_per_second=4.74294Gi/s
BM_ByteStreamSplitDecode_Double_Neon/1024 1626 ns 1626 ns
429095 bytes_per_second=4.69278Gi/s
BM_ByteStreamSplitDecode_Double_Neon/4096 6485 ns 6485 ns
107513 bytes_per_second=4.70567Gi/s
BM_ByteStreamSplitDecode_Double_Neon/32768 59680 ns 59680 ns
11757 bytes_per_second=4.09083Gi/s
BM_ByteStreamSplitDecode_Double_Neon/65536 120697 ns 120688 ns
5594 bytes_per_second=4.04582Gi/s
encode (drop)
-------------
BM_ByteStreamSplitEncode_Float_Scalar/1024 1142 ns 1142 ns
613228 bytes_per_second=3.34041Gi/s
BM_ByteStreamSplitEncode_Float_Scalar/4096 4511 ns 4511 ns
155178 bytes_per_second=3.3825Gi/s
BM_ByteStreamSplitEncode_Float_Scalar/32768 37560 ns 37560 ns
18636 bytes_per_second=3.25003Gi/s
BM_ByteStreamSplitEncode_Float_Scalar/65536 75348 ns 75343 ns
9301 bytes_per_second=3.2404Gi/s
BM_ByteStreamSplitEncode_Double_Scalar/1024 2201 ns 2201 ns
318028 bytes_per_second=3.46606Gi/s
BM_ByteStreamSplitEncode_Double_Scalar/4096 8795 ns 8795 ns
79615 bytes_per_second=3.46994Gi/s
BM_ByteStreamSplitEncode_Double_Scalar/32768 77388 ns 77383 ns
9045 bytes_per_second=3.15497Gi/s
BM_ByteStreamSplitEncode_Double_Scalar/65536 153900 ns 153900 ns
4543 bytes_per_second=3.17272Gi/s
BM_ByteStreamSplitEncode_Float_Neon/1024 1238 ns 1238 ns
565551 bytes_per_second=3.08201Gi/s
BM_ByteStreamSplitEncode_Float_Neon/4096 4894 ns 4893 ns
143073 bytes_per_second=3.11821Gi/s
BM_ByteStreamSplitEncode_Float_Neon/32768 39594 ns 39594 ns
17679 bytes_per_second=3.08304Gi/s
BM_ByteStreamSplitEncode_Float_Neon/65536 79201 ns 79200 ns
8838 bytes_per_second=3.0826Gi/s
BM_ByteStreamSplitEncode_Double_Neon/1024 2573 ns 2573 ns
272609 bytes_per_second=2.96532Gi/s
BM_ByteStreamSplitEncode_Double_Neon/4096 10249 ns 10248 ns
68149 bytes_per_second=2.97782Gi/s
BM_ByteStreamSplitEncode_Double_Neon/32768 88791 ns 88791 ns
7884 bytes_per_second=2.7496Gi/s
BM_ByteStreamSplitEncode_Double_Neon/65536 176888 ns 176888 ns
3958 bytes_per_second=2.7604Gi/s
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]