mapleFU commented on PR #37940:
URL: https://github.com/apache/arrow/pull/37940#issuecomment-1745442093
I'm interested in performance it can bring. Verified decoding benchmark in
MacOS M1Pro with RelWithDebugInfo and O2.
Encoding doesn't changed a lot, for Decoding, I guess previously we tent to
use 32bits and 64bits, which is waste of space but a benefit for decoding.
Smaller size would just make decoding a bit smaller. However I think we should
merge this patch, it can make `DELTA_BINARY_PACKED` much more smaller in most
cases, and be much better in some really "Delta" cases.
## Encode
Before:
```
BM_DeltaBitPackingEncode_Int32_Fixed/1024 7388 ns 6727 ns
93419 bytes_per_second=580.658M/s items_per_second=152.216M/s
BM_DeltaBitPackingEncode_Int32_Fixed/4096 46325 ns 27672 ns
27080 bytes_per_second=564.655M/s items_per_second=148.021M/s
BM_DeltaBitPackingEncode_Int32_Fixed/32768 221001 ns 205395 ns
3243 bytes_per_second=608.584M/s items_per_second=159.537M/s
BM_DeltaBitPackingEncode_Int32_Fixed/65536 531356 ns 423160 ns
1679 bytes_per_second=590.793M/s items_per_second=154.873M/s
BM_DeltaBitPackingEncode_Int64_Fixed/1024 8327 ns 6813 ns
105758 bytes_per_second=1.11987G/s items_per_second=150.306M/s
BM_DeltaBitPackingEncode_Int64_Fixed/4096 32330 ns 27128 ns
25970 bytes_per_second=1.12495G/s items_per_second=150.988M/s
BM_DeltaBitPackingEncode_Int64_Fixed/32768 239290 ns 210625 ns
3289 bytes_per_second=1.15913G/s items_per_second=155.575M/s
BM_DeltaBitPackingEncode_Int64_Fixed/65536 417320 ns 405689 ns
1687 bytes_per_second=1.20359G/s items_per_second=161.543M/s
BM_DeltaBitPackingEncode_Int32_Narrow/1024 8008 ns 7857 ns
91353 bytes_per_second=497.153M/s items_per_second=130.326M/s
BM_DeltaBitPackingEncode_Int32_Narrow/4096 29652 ns 29567 ns
23353 bytes_per_second=528.467M/s items_per_second=138.534M/s
BM_DeltaBitPackingEncode_Int32_Narrow/32768 274176 ns 258310 ns
2694 bytes_per_second=483.915M/s items_per_second=126.856M/s
BM_DeltaBitPackingEncode_Int32_Narrow/65536 600979 ns 557093 ns
1000 bytes_per_second=448.758M/s items_per_second=117.639M/s
BM_DeltaBitPackingEncode_Int64_Narrow/1024 10400 ns 10082 ns
68428 bytes_per_second=774.872M/s items_per_second=101.564M/s
BM_DeltaBitPackingEncode_Int64_Narrow/4096 48600 ns 46734 ns
14683 bytes_per_second=668.678M/s items_per_second=87.645M/s
BM_DeltaBitPackingEncode_Int64_Narrow/32768 372571 ns 358108 ns
1978 bytes_per_second=698.113M/s items_per_second=91.5031M/s
BM_DeltaBitPackingEncode_Int64_Narrow/65536 693363 ns 687021 ns
1029 bytes_per_second=727.779M/s items_per_second=95.3915M/s
BM_DeltaBitPackingEncode_Int32_Wide/1024 8086 ns 7889 ns
90200 bytes_per_second=495.166M/s items_per_second=129.805M/s
BM_DeltaBitPackingEncode_Int32_Wide/4096 31668 ns 30423 ns
23291 bytes_per_second=513.592M/s items_per_second=134.635M/s
BM_DeltaBitPackingEncode_Int32_Wide/32768 269229 ns 262281 ns
2667 bytes_per_second=476.588M/s items_per_second=124.935M/s
BM_DeltaBitPackingEncode_Int32_Wide/65536 517646 ns 506281 ns
1395 bytes_per_second=493.797M/s items_per_second=129.446M/s
BM_DeltaBitPackingEncode_Int64_Wide/1024 10090 ns 10087 ns
69206 bytes_per_second=774.544M/s items_per_second=101.521M/s
BM_DeltaBitPackingEncode_Int64_Wide/4096 46402 ns 46005 ns
15212 bytes_per_second=679.276M/s items_per_second=89.0341M/s
BM_DeltaBitPackingEncode_Int64_Wide/32768 361227 ns 356360 ns
1967 bytes_per_second=701.538M/s items_per_second=91.952M/s
BM_DeltaBitPackingEncode_Int64_Wide/65536 687265 ns 687060 ns
1013 bytes_per_second=727.738M/s items_per_second=95.3861M/s
```
After:
```
BM_DeltaBitPackingEncode_Int32_Fixed/1024 6746 ns 6622 ns
107996 bytes_per_second=589.889M/s items_per_second=154.636M/s
BM_DeltaBitPackingEncode_Int32_Fixed/4096 26207 ns 25429 ns
27415 bytes_per_second=614.466M/s items_per_second=161.078M/s
BM_DeltaBitPackingEncode_Int32_Fixed/32768 236115 ns 207114 ns
3471 bytes_per_second=603.534M/s items_per_second=158.213M/s
BM_DeltaBitPackingEncode_Int32_Fixed/65536 450868 ns 416761 ns
1668 bytes_per_second=599.864M/s items_per_second=157.251M/s
BM_DeltaBitPackingEncode_Int64_Fixed/1024 6489 ns 6484 ns
108361 bytes_per_second=1.17666G/s items_per_second=157.929M/s
BM_DeltaBitPackingEncode_Int64_Fixed/4096 25210 ns 25206 ns
27800 bytes_per_second=1.21072G/s items_per_second=162.5M/s
BM_DeltaBitPackingEncode_Int64_Fixed/32768 202326 ns 202064 ns
3460 bytes_per_second=1.20823G/s items_per_second=162.166M/s
BM_DeltaBitPackingEncode_Int64_Fixed/65536 403463 ns 403353 ns
1743 bytes_per_second=1.21056G/s items_per_second=162.478M/s
BM_DeltaBitPackingEncode_Int32_Narrow/1024 7066 ns 7062 ns
99590 bytes_per_second=553.105M/s items_per_second=144.993M/s
BM_DeltaBitPackingEncode_Int32_Narrow/4096 26993 ns 26980 ns
26047 bytes_per_second=579.125M/s items_per_second=151.814M/s
BM_DeltaBitPackingEncode_Int32_Narrow/32768 232130 ns 227611 ns
3087 bytes_per_second=549.182M/s items_per_second=143.965M/s
BM_DeltaBitPackingEncode_Int32_Narrow/65536 445752 ns 444218 ns
1574 bytes_per_second=562.787M/s items_per_second=147.531M/s
BM_DeltaBitPackingEncode_Int64_Narrow/1024 6998 ns 6994 ns
100485 bytes_per_second=1116.97M/s items_per_second=146.404M/s
BM_DeltaBitPackingEncode_Int64_Narrow/4096 26963 ns 26955 ns
25345 bytes_per_second=1.13219G/s items_per_second=151.96M/s
BM_DeltaBitPackingEncode_Int64_Narrow/32768 224846 ns 223845 ns
3141 bytes_per_second=1116.85M/s items_per_second=146.387M/s
BM_DeltaBitPackingEncode_Int64_Narrow/65536 440290 ns 440284 ns
1591 bytes_per_second=1.10901G/s items_per_second=148.849M/s
BM_DeltaBitPackingEncode_Int32_Wide/1024 7925 ns 7923 ns
87495 bytes_per_second=493.032M/s items_per_second=129.245M/s
BM_DeltaBitPackingEncode_Int32_Wide/4096 30254 ns 30251 ns
23146 bytes_per_second=516.518M/s items_per_second=135.402M/s
BM_DeltaBitPackingEncode_Int32_Wide/32768 256295 ns 256292 ns
2714 bytes_per_second=487.725M/s items_per_second=127.854M/s
BM_DeltaBitPackingEncode_Int32_Wide/65536 507129 ns 500402 ns
1399 bytes_per_second=499.598M/s items_per_second=130.967M/s
BM_DeltaBitPackingEncode_Int64_Wide/1024 10406 ns 10405 ns
67149 bytes_per_second=750.859M/s items_per_second=98.4165M/s
BM_DeltaBitPackingEncode_Int64_Wide/4096 45677 ns 45439 ns
15300 bytes_per_second=687.734M/s items_per_second=90.1427M/s
BM_DeltaBitPackingEncode_Int64_Wide/32768 346803 ns 346780 ns
2019 bytes_per_second=720.919M/s items_per_second=94.4923M/s
BM_DeltaBitPackingEncode_Int64_Wide/65536 710268 ns 705693 ns
964 bytes_per_second=708.523M/s items_per_second=92.8676M/s
```
## Decode
Before:
```
Run on (10 X 24.1205 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x10)
Load Average: 11.76, 8.38, 5.93
------------------------------------------------------------------------------------------------------
Benchmark Time CPU
Iterations UserCounters...
------------------------------------------------------------------------------------------------------
BM_DeltaBitPackingDecode_Int32_Fixed/1024 1170 ns 1140 ns
617551 bytes_per_second=3.34478G/s items_per_second=897.857M/s
BM_DeltaBitPackingDecode_Int32_Fixed/4096 4223 ns 4111 ns
171441 bytes_per_second=3.71136G/s items_per_second=996.261M/s
BM_DeltaBitPackingDecode_Int32_Fixed/32768 33916 ns 32218 ns
22054 bytes_per_second=3.78883G/s items_per_second=1017.06M/s
BM_DeltaBitPackingDecode_Int32_Fixed/65536 65659 ns 64014 ns
11133 bytes_per_second=3.81385G/s items_per_second=1023.77M/s
BM_DeltaBitPackingDecode_Int64_Fixed/1024 1170 ns 1124 ns
630523 bytes_per_second=6.7877G/s items_per_second=911.029M/s
BM_DeltaBitPackingDecode_Int64_Fixed/4096 3725 ns 3723 ns
188036 bytes_per_second=8.19693G/s items_per_second=1.10017G/s
BM_DeltaBitPackingDecode_Int64_Fixed/32768 29321 ns 29308 ns
23828 bytes_per_second=8.33016G/s items_per_second=1.11805G/s
BM_DeltaBitPackingDecode_Int64_Fixed/65536 58346 ns 58318 ns
11880 bytes_per_second=8.37271G/s items_per_second=1.12377G/s
BM_DeltaBitPackingDecode_Int32_Narrow/1024 1068 ns 1068 ns
664976 bytes_per_second=3.5725G/s items_per_second=958.986M/s
BM_DeltaBitPackingDecode_Int32_Narrow/4096 4082 ns 4028 ns
176849 bytes_per_second=3.78799G/s items_per_second=1016.83M/s
BM_DeltaBitPackingDecode_Int32_Narrow/32768 31969 ns 31932 ns
22023 bytes_per_second=3.82279G/s items_per_second=1026.17M/s
BM_DeltaBitPackingDecode_Int32_Narrow/65536 63682 ns 63643 ns
11016 bytes_per_second=3.83611G/s items_per_second=1029.75M/s
BM_DeltaBitPackingDecode_Int64_Narrow/1024 934 ns 932 ns
748351 bytes_per_second=8.18632G/s items_per_second=1098.75M/s
BM_DeltaBitPackingDecode_Int64_Narrow/4096 3404 ns 3403 ns
202131 bytes_per_second=8.96808G/s items_per_second=1.20367G/s
BM_DeltaBitPackingDecode_Int64_Narrow/32768 29199 ns 29196 ns
24017 bytes_per_second=8.36203G/s items_per_second=1.12233G/s
BM_DeltaBitPackingDecode_Int64_Narrow/65536 57770 ns 57768 ns
12077 bytes_per_second=8.45249G/s items_per_second=1.13447G/s
BM_DeltaBitPackingDecode_Int32_Wide/1024 1087 ns 1087 ns
643548 bytes_per_second=3.50993G/s items_per_second=942.189M/s
BM_DeltaBitPackingDecode_Int32_Wide/4096 4087 ns 4087 ns
172062 bytes_per_second=3.73328G/s items_per_second=1002.15M/s
BM_DeltaBitPackingDecode_Int32_Wide/32768 32799 ns 32797 ns
21498 bytes_per_second=3.72199G/s items_per_second=999.114M/s
BM_DeltaBitPackingDecode_Int32_Wide/65536 65459 ns 65453 ns
10717 bytes_per_second=3.73004G/s items_per_second=1001.27M/s
BM_DeltaBitPackingDecode_Int64_Wide/1024 1016 ns 1016 ns
687198 bytes_per_second=7.5105G/s items_per_second=1008.04M/s
BM_DeltaBitPackingDecode_Int64_Wide/4096 3742 ns 3742 ns
187931 bytes_per_second=8.15518G/s items_per_second=1094.57M/s
BM_DeltaBitPackingDecode_Int64_Wide/32768 31511 ns 31509 ns
22198 bytes_per_second=7.74827G/s items_per_second=1039.96M/s
BM_DeltaBitPackingDecode_Int64_Wide/65536 62441 ns 62433 ns
11171 bytes_per_second=7.82092G/s items_per_second=1049.71M/s
```
After:
```
------------------------------------------------------------------------------------------------------
Benchmark Time CPU
Iterations UserCounters...
------------------------------------------------------------------------------------------------------
BM_DeltaBitPackingDecode_Int32_Fixed/1024 1524 ns 1210 ns
582799 bytes_per_second=3.15193G/s items_per_second=846.089M/s
BM_DeltaBitPackingDecode_Int32_Fixed/4096 6155 ns 4283 ns
154236 bytes_per_second=3.56298G/s items_per_second=956.43M/s
BM_DeltaBitPackingDecode_Int32_Fixed/32768 40811 ns 32998 ns
20873 bytes_per_second=3.69928G/s items_per_second=993.019M/s
BM_DeltaBitPackingDecode_Int32_Fixed/65536 73049 ns 64550 ns
10534 bytes_per_second=3.78218G/s items_per_second=1015.27M/s
BM_DeltaBitPackingDecode_Int64_Fixed/1024 1164 ns 1119 ns
628761 bytes_per_second=6.81583G/s items_per_second=914.806M/s
BM_DeltaBitPackingDecode_Int64_Fixed/4096 3824 ns 3780 ns
186756 bytes_per_second=8.07281G/s items_per_second=1083.51M/s
BM_DeltaBitPackingDecode_Int64_Fixed/32768 30175 ns 29516 ns
24098 bytes_per_second=8.2714G/s items_per_second=1.11017G/s
BM_DeltaBitPackingDecode_Int64_Fixed/65536 60017 ns 59018 ns
11983 bytes_per_second=8.27339G/s items_per_second=1.11044G/s
BM_DeltaBitPackingDecode_Int32_Narrow/1024 1381 ns 1378 ns
501986 bytes_per_second=2.76793G/s items_per_second=743.01M/s
BM_DeltaBitPackingDecode_Int32_Narrow/4096 5404 ns 5369 ns
131067 bytes_per_second=2.84204G/s items_per_second=762.903M/s
BM_DeltaBitPackingDecode_Int32_Narrow/32768 45844 ns 43339 ns
16220 bytes_per_second=2.81666G/s items_per_second=756.091M/s
BM_DeltaBitPackingDecode_Int32_Narrow/65536 86916 ns 84916 ns
8257 bytes_per_second=2.87509G/s items_per_second=771.777M/s
BM_DeltaBitPackingDecode_Int64_Narrow/1024 1248 ns 1159 ns
615866 bytes_per_second=6.58504G/s items_per_second=883.829M/s
BM_DeltaBitPackingDecode_Int64_Narrow/4096 4298 ns 4296 ns
162309 bytes_per_second=7.10393G/s items_per_second=953.473M/s
BM_DeltaBitPackingDecode_Int64_Narrow/32768 35427 ns 35378 ns
19834 bytes_per_second=6.90082G/s items_per_second=926.212M/s
BM_DeltaBitPackingDecode_Int64_Narrow/65536 70880 ns 70862 ns
9877 bytes_per_second=6.89062G/s items_per_second=924.844M/s
BM_DeltaBitPackingDecode_Int32_Wide/1024 1360 ns 1359 ns
515479 bytes_per_second=2.80606G/s items_per_second=753.246M/s
BM_DeltaBitPackingDecode_Int32_Wide/4096 5124 ns 5121 ns
136309 bytes_per_second=2.97938G/s items_per_second=799.772M/s
BM_DeltaBitPackingDecode_Int32_Wide/32768 40943 ns 40924 ns
16860 bytes_per_second=2.98282G/s items_per_second=800.695M/s
BM_DeltaBitPackingDecode_Int32_Wide/65536 82019 ns 81988 ns
8431 bytes_per_second=2.97777G/s items_per_second=799.34M/s
BM_DeltaBitPackingDecode_Int64_Wide/1024 1278 ns 1278 ns
548551 bytes_per_second=5.97075G/s items_per_second=801.38M/s
BM_DeltaBitPackingDecode_Int64_Wide/4096 4778 ns 4777 ns
146543 bytes_per_second=6.38874G/s items_per_second=857.482M/s
BM_DeltaBitPackingDecode_Int64_Wide/32768 38399 ns 38390 ns
18267 bytes_per_second=6.35954G/s items_per_second=853.563M/s
BM_DeltaBitPackingDecode_Int64_Wide/65536 76813 ns 76764 ns
9079 bytes_per_second=6.36081G/s items_per_second=853.734M/s
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]