mapleFU commented on PR #39807:
URL: https://github.com/apache/arrow/pull/39807#issuecomment-2009764268
Oh my M2 MacOS with Release(-O2):
after:
```
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark
Time CPU
Iterations UserCounters...
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192
5614 ns 5572 ns 111778 bytes_per_second=388.87Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192
19338 ns 19219 ns 36300 bytes_per_second=947.722Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536
25125 ns 24936 ns 27939 bytes_per_second=730.436Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192
294839 ns 293949 ns 2376 bytes_per_second=966.059Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536
300140 ns 299194 ns 2349 bytes_per_second=949.124Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576
343334 ns 342222 ns 2033 bytes_per_second=829.787Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192
7607 ns 7587 ns 93970 bytes_per_second=285.575Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192
22354 ns 22280 ns 31016 bytes_per_second=817.501Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536
27680 ns 27601 ns 25450 bytes_per_second=659.902Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192
303460 ns 302540 ns 2236 bytes_per_second=938.625Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536
309832 ns 307459 ns 2277 bytes_per_second=923.609Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576
376627 ns 371434 ns 2007 bytes_per_second=764.527Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192
6258 ns 6114 ns 114493
bytes_per_second=354.367Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192
19911 ns 19806 ns 35271 bytes_per_second=919.631Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536
25911 ns 25687 ns 28390 bytes_per_second=709.089Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192
311971 ns 309630 ns 2263 bytes_per_second=917.132Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536
377304 ns 375390 ns 1871 bytes_per_second=756.472Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576
343459 ns 342646 ns 1984 bytes_per_second=828.761Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192
7540 ns 7520 ns 93488 bytes_per_second=288.146Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192
23149 ns 23060 ns 30119 bytes_per_second=789.876Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536
27161 ns 27099 ns 25562 bytes_per_second=672.134Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192
315507 ns 314274 ns 2198 bytes_per_second=903.579Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536
384325 ns 382741 ns 1833 bytes_per_second=741.943Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576
353540 ns 350845 ns 1998 bytes_per_second=809.393Mi/s
```
before:
```
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192
5678 ns 5644 ns 112138
bytes_per_second=383.908Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192
24113 ns 22721 ns 31043 bytes_per_second=801.638Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536
26425 ns 26231 ns 26138 bytes_per_second=694.383Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192
383174 ns 374055 ns 1893 bytes_per_second=759.171Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536
395898 ns 383379 ns 1969 bytes_per_second=740.708Mi/s
CompressionInputZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576
422406 ns 416277 ns 1658 bytes_per_second=682.17Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192
9351 ns 8686 ns 83041 bytes_per_second=249.445Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192
24849 ns 24622 ns 26555 bytes_per_second=739.749Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536
30673 ns 30388 ns 23338 bytes_per_second=599.387Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192
400725 ns 398423 ns 1775 bytes_per_second=712.739Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536
397969 ns 395804 ns 1756 bytes_per_second=717.456Mi/s
CompressionInputNonZeroCopyBenchmarkIntoBuffer<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576
477663 ns 469510 ns 1559 bytes_per_second=604.826Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192
6486 ns 6229 ns 121200
bytes_per_second=347.848Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192
21640 ns 21362 ns 32395 bytes_per_second=852.631Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536
26726 ns 26556 ns 24419 bytes_per_second=685.893Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192
363655 ns 361942 ns 1926 bytes_per_second=784.578Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536
443300 ns 440702 ns 1566 bytes_per_second=644.363Mi/s
CompressionInputZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576
404404 ns 402066 ns 1715 bytes_per_second=706.282Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:8192/PerReadBytes:8192
8004 ns 7972 ns 86126 bytes_per_second=271.807Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:8192
24898 ns 24779 ns 24412 bytes_per_second=735.057Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:65536/PerReadBytes:65536
30572 ns 30433 ns 23030 bytes_per_second=598.493Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:8192
414666 ns 410512 ns 1732 bytes_per_second=691.75Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:65536
466478 ns 464285 ns 1389 bytes_per_second=611.632Mi/s
CompressionInputNonZeroCopyBenchmarkDirectRead<::arrow::Compression::LZ4_FRAME>/InputBytes:1048576/PerReadBytes:1048576
477242 ns 467817 ns 1568 bytes_per_second=607.015Mi/s
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]