lidavidm opened a new pull request #10060: URL: https://github.com/apache/arrow/pull/10060
This implements a CountRows method for scanner. It will ask the fragment if it can count rows using only metadata, and otherwise project away columns and count the resulting rows. Originally, I thought we did not need a special optimization for the metadata-only case, because the Parquet reader will skip I/O and fabricate empty batches if you ask it to read no columns. However, in benchmarking, the overhead of the rest of the pipeline was still significant and so I implemented the optimization after all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
