mapleFU commented on PR #41690: URL: https://github.com/apache/arrow/pull/41690#issuecomment-2116627942
On my M1 MacOS with -O3: Current: ``` ------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------------- ReferenceNaiveBitmapReader/8192 85979 ns 85441 ns 7947 bytes_per_second=182.874M/s BitmapReader/8192 67869 ns 64637 ns 11012 bytes_per_second=241.734M/s BitmapUInt64Reader/8192 678 ns 670 ns 1051114 bytes_per_second=11.386G/s BitRunReader/-1 9423 ns 9399 ns 74456 bytes_per_second=51.95M/s BitRunReader/0 148 ns 148 ns 4539942 bytes_per_second=3.2326G/s BitRunReader/10 1732 ns 1731 ns 407401 bytes_per_second=282.108M/s BitRunReader/25 3471 ns 3470 ns 202346 bytes_per_second=140.726M/s BitRunReader/50 4864 ns 4857 ns 144606 bytes_per_second=100.526M/s BitRunReader/60 4547 ns 4545 ns 153297 bytes_per_second=107.438M/s BitRunReader/75 3610 ns 3607 ns 189480 bytes_per_second=135.387M/s BitRunReader/99 390 ns 390 ns 1796424 bytes_per_second=1.22331G/s BitRunReaderLinear/-1 5802 ns 5796 ns 121120 bytes_per_second=84.2382M/s BitRunReaderLinear/0 2625 ns 2624 ns 265712 bytes_per_second=186.104M/s BitRunReaderLinear/10 3153 ns 3152 ns 221187 bytes_per_second=154.922M/s BitRunReaderLinear/25 3916 ns 3914 ns 167117 bytes_per_second=124.75M/s BitRunReaderLinear/50 4402 ns 4401 ns 153699 bytes_per_second=110.958M/s BitRunReaderLinear/60 4421 ns 4416 ns 158283 bytes_per_second=110.568M/s BitRunReaderLinear/75 3962 ns 3961 ns 176550 bytes_per_second=123.271M/s BitRunReaderLinear/99 3112 ns 3068 ns 234100 bytes_per_second=159.176M/s SetBitRunReader/-1 9961 ns 9956 ns 70916 bytes_per_second=49.0416M/s SetBitRunReader/0 36.0 ns 35.9 ns 19570128 bytes_per_second=13.2686G/s SetBitRunReader/10 1744 ns 1743 ns 398788 bytes_per_second=280.208M/s SetBitRunReader/25 4255 ns 4205 ns 166620 bytes_per_second=116.132M/s SetBitRunReader/50 5598 ns 5575 ns 124327 bytes_per_second=87.5788M/s SetBitRunReader/60 5371 ns 5348 ns 131510 bytes_per_second=91.2951M/s SetBitRunReader/75 4145 ns 4143 ns 168118 bytes_per_second=117.851M/s SetBitRunReader/99 354 ns 353 ns 1978715 bytes_per_second=1.34923G/s ReverseSetBitRunReader/-1 8421 ns 8417 ns 83486 bytes_per_second=58.0115M/s ReverseSetBitRunReader/0 33.4 ns 33.4 ns 21037889 bytes_per_second=14.2938G/s ReverseSetBitRunReader/10 1513 ns 1512 ns 467855 bytes_per_second=322.861M/s ReverseSetBitRunReader/25 3451 ns 3448 ns 202580 bytes_per_second=141.593M/s ReverseSetBitRunReader/50 4649 ns 4617 ns 149863 bytes_per_second=105.76M/s ReverseSetBitRunReader/60 4513 ns 4501 ns 157672 bytes_per_second=108.489M/s ReverseSetBitRunReader/75 3580 ns 3573 ns 194630 bytes_per_second=136.648M/s ReverseSetBitRunReader/99 332 ns 323 ns 2185390 bytes_per_second=1.4759G/s VisitBits/8192 62529 ns 62421 ns 11277 bytes_per_second=250.316M/s VisitBitsUnrolled/8192 10295 ns 10290 ns 66433 bytes_per_second=1.48293G/s SetBitsTo/2 3.44 ns 3.44 ns 200494362 bytes_per_second=554.859M/s SetBitsTo/16 6.87 ns 6.87 ns 100227660 bytes_per_second=2.16862G/s SetBitsTo/1024 12.2 ns 12.2 ns 56052465 bytes_per_second=78.161G/s SetBitsTo/131072 2998 ns 2997 ns 274108 bytes_per_second=40.7337G/s ReferenceNaiveBitmapWriter/8192 162284 ns 162221 ns 4306 bytes_per_second=48.1595M/s BitmapWriter/8192 63899 ns 63868 ns 10919 bytes_per_second=122.322M/s FirstTimeBitmapWriter/8192 66944 ns 66918 ns 10509 bytes_per_second=116.747M/s GenerateBits/8192 66600 ns 66527 ns 10655 bytes_per_second=117.433M/s GenerateBitsUnrolled/8192 63659 ns 63637 ns 11028 bytes_per_second=122.767M/s CopyBitmapWithoutOffset/8192 113 ns 113 ns 6564880 bytes_per_second=67.325G/s CopyBitmapWithOffset/8192 590 ns 590 ns 1185055 bytes_per_second=12.9281G/s CopyBitmapWithOffsetBoth/8192 1427 ns 1426 ns 496930 bytes_per_second=5.35158G/s BitmapEqualsWithoutOffset/8192 225 ns 225 ns 2946649 bytes_per_second=33.9253G/s BitmapEqualsWithOffset/8192 927 ns 926 ns 755532 bytes_per_second=8.23875G/s BenchmarkBitmapAnd/32768/0 507 ns 506 ns 1000000 bytes_per_second=60.3348G/s BenchmarkBitmapAnd/131072/0 3777 ns 3775 ns 206341 bytes_per_second=32.3397G/s BenchmarkBitmapAnd/32768/1 4190 ns 4189 ns 167617 bytes_per_second=7.28547G/s BenchmarkBitmapAnd/131072/1 15810 ns 15809 ns 44006 bytes_per_second=7.72164G/s BenchmarkBitmapAnd/32768/2 4186 ns 4184 ns 168228 bytes_per_second=7.29366G/s BenchmarkBitmapAnd/131072/2 16027 ns 16024 ns 44225 bytes_per_second=7.61801G/s BenchmarkBitmapVisitBitsetAnd/32768/0 635335 ns 634998 ns 1108 bytes_per_second=49.2127M/s BenchmarkBitmapVisitBitsetAnd/131072/0 2509032 ns 2507096 ns 271 bytes_per_second=49.8585M/s BenchmarkBitmapVisitBitsetAnd/32768/1 637952 ns 628201 ns 1133 bytes_per_second=49.7452M/s BenchmarkBitmapVisitBitsetAnd/131072/1 2481080 ns 2480039 ns 281 bytes_per_second=50.4024M/s BenchmarkBitmapVisitBitsetAnd/32768/2 626262 ns 625913 ns 1130 bytes_per_second=49.927M/s BenchmarkBitmapVisitBitsetAnd/131072/2 2501350 ns 2500163 ns 282 bytes_per_second=49.9967M/s BenchmarkBitmapVisitUInt8And/32768/0 19113 ns 19104 ns 36615 bytes_per_second=1.59748G/s BenchmarkBitmapVisitUInt8And/131072/0 77321 ns 77252 ns 8394 bytes_per_second=1.58016G/s BenchmarkBitmapVisitUInt8And/32768/1 25639 ns 25628 ns 27275 bytes_per_second=1.19078G/s BenchmarkBitmapVisitUInt8And/131072/1 106034 ns 105291 ns 6568 bytes_per_second=1.15936G/s BenchmarkBitmapVisitUInt8And/32768/2 25685 ns 25676 ns 27225 bytes_per_second=1.18855G/s BenchmarkBitmapVisitUInt8And/131072/2 102330 ns 102292 ns 6874 bytes_per_second=1.19335G/s BenchmarkBitmapVisitUInt64And/32768/0 1985 ns 1983 ns 356961 bytes_per_second=15.3931G/s BenchmarkBitmapVisitUInt64And/131072/0 7849 ns 7845 ns 87993 bytes_per_second=15.5596G/s BenchmarkBitmapVisitUInt64And/32768/1 4072 ns 4066 ns 172934 bytes_per_second=7.50486G/s BenchmarkBitmapVisitUInt64And/131072/1 14624 ns 14617 ns 47750 bytes_per_second=8.3513G/s BenchmarkBitmapVisitUInt64And/32768/2 4046 ns 4044 ns 173114 bytes_per_second=7.54624G/s BenchmarkBitmapVisitUInt64And/131072/2 14608 ns 14603 ns 47745 bytes_per_second=8.35953G/s ``` Before: ``` BitRunReader/-1 9851 ns 9778 ns 72564 bytes_per_second=49.9364M/s BitRunReader/0 151 ns 150 ns 4713773 bytes_per_second=3.18241G/s BitRunReader/10 1862 ns 1810 ns 399247 bytes_per_second=269.696M/s BitRunReader/25 3558 ns 3552 ns 197438 bytes_per_second=137.484M/s BitRunReader/50 5051 ns 5028 ns 140890 bytes_per_second=97.1205M/s BitRunReader/60 4731 ns 4695 ns 147650 bytes_per_second=104.005M/s BitRunReader/75 3779 ns 3761 ns 188246 bytes_per_second=129.834M/s BitRunReader/99 400 ns 399 ns 1750416 bytes_per_second=1.19611G/s BitRunReaderLinear/-1 5959 ns 5921 ns 118993 bytes_per_second=82.4684M/s BitRunReaderLinear/0 2868 ns 2817 ns 251639 bytes_per_second=173.309M/s BitRunReaderLinear/10 3318 ns 3299 ns 210512 bytes_per_second=148.011M/s BitRunReaderLinear/25 4238 ns 4160 ns 171548 bytes_per_second=117.381M/s BitRunReaderLinear/50 4799 ns 4765 ns 146327 bytes_per_second=102.473M/s BitRunReaderLinear/60 6237 ns 5137 ns 148266 bytes_per_second=95.0566M/s BitRunReaderLinear/75 4166 ns 4124 ns 163251 bytes_per_second=118.391M/s BitRunReaderLinear/99 3194 ns 3117 ns 225191 bytes_per_second=156.63M/s SetBitRunReader/-1 10720 ns 10406 ns 67369 bytes_per_second=46.9242M/s SetBitRunReader/0 38.7 ns 37.2 ns 18975949 bytes_per_second=12.8012G/s SetBitRunReader/10 1864 ns 1801 ns 388658 bytes_per_second=271.055M/s SetBitRunReader/25 4253 ns 4179 ns 168497 bytes_per_second=116.832M/s SetBitRunReader/50 5745 ns 5646 ns 123732 bytes_per_second=86.49M/s SetBitRunReader/60 5588 ns 5401 ns 128778 bytes_per_second=90.401M/s SetBitRunReader/75 4576 ns 4372 ns 161457 bytes_per_second=111.672M/s SetBitRunReader/99 384 ns 372 ns 1874535 bytes_per_second=1.28167G/s ReverseSetBitRunReader/-1 9195 ns 8838 ns 78374 bytes_per_second=55.2473M/s ReverseSetBitRunReader/0 38.0 ns 35.7 ns 20435510 bytes_per_second=13.3507G/s ReverseSetBitRunReader/10 1715 ns 1609 ns 415295 bytes_per_second=303.425M/s ReverseSetBitRunReader/25 4044 ns 3728 ns 188133 bytes_per_second=130.966M/s ReverseSetBitRunReader/50 5226 ns 4945 ns 140588 bytes_per_second=98.7392M/s ReverseSetBitRunReader/60 4861 ns 4712 ns 148026 bytes_per_second=103.635M/s ReverseSetBitRunReader/75 3791 ns 3745 ns 180838 bytes_per_second=130.373M/s ReverseSetBitRunReader/99 336 ns 332 ns 2092769 bytes_per_second=1.43647G/s VisitBits/8192 68772 ns 65844 ns 10642 bytes_per_second=237.304M/s VisitBitsUnrolled/8192 10995 ns 10838 ns 64390 bytes_per_second=1.40794G/s SetBitsTo/2 3.71 ns 3.64 ns 194128446 bytes_per_second=523.67M/s SetBitsTo/16 7.23 ns 7.18 ns 95434157 bytes_per_second=2.07555G/s SetBitsTo/1024 13.3 ns 12.9 ns 53696217 bytes_per_second=74.0268G/s SetBitsTo/131072 2284 ns 2254 ns 311704 bytes_per_second=54.1618G/s ReferenceNaiveBitmapWriter/8192 179032 ns 170889 ns 4023 bytes_per_second=45.7167M/s BitmapWriter/8192 66093 ns 65720 ns 10579 bytes_per_second=118.876M/s FirstTimeBitmapWriter/8192 68766 ns 68523 ns 10242 bytes_per_second=114.013M/s GenerateBits/8192 67005 ns 66774 ns 10463 bytes_per_second=116.999M/s GenerateBitsUnrolled/8192 64922 ns 64678 ns 10681 bytes_per_second=120.791M/s CopyBitmapWithoutOffset/8192 123 ns 115 ns 6582783 bytes_per_second=66.6195G/s CopyBitmapWithOffset/8192 611 ns 606 ns 1169747 bytes_per_second=12.5907G/s CopyBitmapWithOffsetBoth/8192 1475 ns 1471 ns 465045 bytes_per_second=5.18633G/s BitmapEqualsWithoutOffset/8192 232 ns 231 ns 3036244 bytes_per_second=33.0157G/s BitmapEqualsWithOffset/8192 962 ns 960 ns 736214 bytes_per_second=7.94465G/s BenchmarkBitmapAnd/32768/0 506 ns 506 ns 1371554 bytes_per_second=60.3483G/s BenchmarkBitmapAnd/131072/0 3266 ns 3265 ns 203884 bytes_per_second=37.3864G/s BenchmarkBitmapAnd/32768/1 4380 ns 4355 ns 164819 bytes_per_second=7.0071G/s BenchmarkBitmapAnd/131072/1 16415 ns 16362 ns 43104 bytes_per_second=7.46055G/s BenchmarkBitmapAnd/32768/2 4519 ns 4369 ns 163878 bytes_per_second=6.98446G/s BenchmarkBitmapAnd/131072/2 18100 ns 17124 ns 40973 bytes_per_second=7.1284G/s BenchmarkBitmapVisitBitsetAnd/32768/0 675592 ns 660185 ns 1015 bytes_per_second=47.3352M/s BenchmarkBitmapVisitBitsetAnd/131072/0 3438938 ns 2690124 ns 267 bytes_per_second=46.4663M/s BenchmarkBitmapVisitBitsetAnd/32768/1 718345 ns 665442 ns 1010 bytes_per_second=46.9613M/s BenchmarkBitmapVisitBitsetAnd/131072/1 2643298 ns 2616118 ns 263 bytes_per_second=47.7807M/s BenchmarkBitmapVisitBitsetAnd/32768/2 687543 ns 659680 ns 1075 bytes_per_second=47.3715M/s BenchmarkBitmapVisitBitsetAnd/131072/2 2743586 ns 2629706 ns 265 bytes_per_second=47.5338M/s BenchmarkBitmapVisitUInt8And/32768/0 23392 ns 21112 ns 34389 bytes_per_second=1.44551G/s BenchmarkBitmapVisitUInt8And/131072/0 112039 ns 87872 ns 7955 bytes_per_second=1.38919G/s BenchmarkBitmapVisitUInt8And/32768/1 31099 ns 28303 ns 24534 bytes_per_second=1104.11M/s BenchmarkBitmapVisitUInt8And/131072/1 119082 ns 110789 ns 6168 bytes_per_second=1.10182G/s BenchmarkBitmapVisitUInt8And/32768/2 27608 ns 27186 ns 25909 bytes_per_second=1.12254G/s BenchmarkBitmapVisitUInt8And/131072/2 110701 ns 108126 ns 6377 bytes_per_second=1.12896G/s BenchmarkBitmapVisitUInt64And/32768/0 2084 ns 2075 ns 332810 bytes_per_second=14.7093G/s BenchmarkBitmapVisitUInt64And/131072/0 9174 ns 8683 ns 82603 bytes_per_second=14.0586G/s BenchmarkBitmapVisitUInt64And/32768/1 4329 ns 4275 ns 163332 bytes_per_second=7.13794G/s BenchmarkBitmapVisitUInt64And/131072/1 15776 ns 15315 ns 46095 bytes_per_second=7.97043G/s BenchmarkBitmapVisitUInt64And/32768/2 4325 ns 4256 ns 166594 bytes_per_second=7.17106G/s BenchmarkBitmapVisitUInt64And/131072/2 15706 ns 15396 ns 46283 bytes_per_second=7.92854G/s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
