westonpace edited a comment on pull request #12150:
URL: https://github.com/apache/arrow/pull/12150#issuecomment-1012676572


   From these benchmarks I was a little surprised about how much impact reading 
a small number of columns has on overall performance.  Although for all 
benchmarks except UncachedFile more columns means the metadata / data ratio is 
bigger as well.
   
   This is the motivation for ARROW-14577:
   
   ```
   ReadUncachedFile/num_cols:1/is_partial:1/real_time               62928170 ns 
    18097667 ns            9 bytes_per_second=254.258M/s
   ReadUncachedFile/num_cols:8/is_partial:1/real_time              133338397 ns 
    33394170 ns            4 bytes_per_second=119.995M/s
   ReadUncachedFile/num_cols:64/is_partial:1/real_time            1467858164 ns 
   284143075 ns            1 bytes_per_second=87.2019M/s
   ReadUncachedFileAsync/num_cols:1/is_partial:1/real_time          82149012 ns 
     4770464 ns            8 bytes_per_second=194.768M/s
   ReadUncachedFileAsync/num_cols:8/is_partial:1/real_time         999669229 ns 
    25252003 ns            1 bytes_per_second=16.0053M/s
   ReadUncachedFileAsync/num_cols:64/is_partial:1/real_time       8953151862 ns 
   122078755 ns            1 bytes_per_second=14.2966M/s
   ```
   
   This is concerning since the two benchmarks should be doing the exact same 
task (a partial read with 1 column should be the same as a full read with 1 
column):
   ```
   ReadMmapCachedFile/num_cols:1/is_partial:0/real_time               125726 ns 
      125677 ns         4359 bytes_per_second=124.278G/s
   ReadMmapCachedFile/num_cols:1/is_partial:1/real_time              1404416 ns 
     1403848 ns          429 bytes_per_second=11.1256G/s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to