WillAyd commented on issue #41224: URL: https://github.com/apache/arrow/issues/41224#issuecomment-2058121190
IIUC we have the required metadata at this point in the parquet reader: https://github.com/apache/arrow/blob/e0f31aa1d4007ebce01cd4bca369d12c4d083162/cpp/src/parquet/file_reader.cc#L117 The metadata however is not forwarded along to the factory function that creates the RecordReader(s). For primitive types, I think the readers work around this by doing things like `PARQUET_THROW_NOT_OK(data_builder_.Reserve(num_decoded * byte_width_));`, but that doesn't help much for binary types without a defined byte_width_. Have to still research more but am hopeful forwarding that metadata can help performance and streamline some of the RecordReader code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
