[I] Fast non-validating reader mode to count records in Avro OCF files [arrow-rs]

via GitHub Tue, 24 Mar 2026 09:39:22 -0700


mzabaluev opened a new issue, #9613:
URL: https://github.com/apache/arrow-rs/issues/9613


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   A query like `SELECT COUNT(*) ...` on an Avro data source needs no data 
fields, only the number of rows in the partitioned data set.
   With the Avro OCF format, this information can be obtained by decoding just 
the block frames, presuming that the data encoding is well-formed and the 
number of encoded records in each block matches the one stated in the block 
header.
   
   **Describe the solution you'd like**
   Add an option method to the reader builders that would make the reader 
bypass any Avro data decoding, including the skipping parsers. Instead, the 
decoder should only parse the OCF data blocks to sum the row counts, and 
produce record batches with no columns, but with the row counts and metadata 
corresponding to the file content. This method should not be used together with 
`with_reader_schema`.
   
   The name of the method should give sufficient warning, e.g. 
`count_without_validation`.
   
   **Describe alternatives you've considered**
   This behavior could be enabled when the reader schema has no fields. 
However, since this could lead to invalid encoded data being accepted based on 
the block framing, it's preferable that an explicit option is used.
   
   **Additional context**
   #9608 concerns the behavior when the reader schema has no fields, but 
validation of Avro data is performed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Fast non-validating reader mode to count records in Avro OCF files [arrow-rs]

Reply via email to