[GitHub] [arrow] lidavidm opened a new pull request #10060: ARROW-9697: [C++][Python][R][Dataset] Add CountRows for Scanner

GitBox Thu, 15 Apr 2021 12:37:55 -0700


lidavidm opened a new pull request #10060:
URL: https://github.com/apache/arrow/pull/10060



   This implements a CountRows method for scanner. It will ask the fragment if 
it can count rows using only metadata, and otherwise project away columns and 
count the resulting rows.
   
   Originally, I thought we did not need a special optimization for the 
metadata-only case, because the Parquet reader will skip I/O and fabricate 
empty batches if you ask it to read no columns. However, in benchmarking, the 
overhead of the rest of the pipeline was still significant and so I implemented 
the optimization after all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] lidavidm opened a new pull request #10060: ARROW-9697: [C++][Python][R][Dataset] Add CountRows for Scanner

Reply via email to