mzabaluev opened a new issue, #9613: URL: https://github.com/apache/arrow-rs/issues/9613
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A query like `SELECT COUNT(*) ...` on an Avro data source needs no data fields, only the number of rows in the partitioned data set. With the Avro OCF format, this information can be obtained by decoding just the block frames, presuming that the data encoding is well-formed and the number of encoded records in each block matches the one stated in the block header. **Describe the solution you'd like** Add an option method to the reader builders that would make the reader bypass any Avro data decoding, including the skipping parsers. Instead, the decoder should only parse the OCF data blocks to sum the row counts, and produce record batches with no columns, but with the row counts and metadata corresponding to the file content. This method should not be used together with `with_reader_schema`. The name of the method should give sufficient warning, e.g. `count_without_validation`. **Describe alternatives you've considered** This behavior could be enabled when the reader schema has no fields. However, since this could lead to invalid encoded data being accepted based on the block framing, it's preferable that an explicit option is used. **Additional context** #9608 concerns the behavior when the reader schema has no fields, but validation of Avro data is performed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
