WillAyd commented on issue #41224:
URL: https://github.com/apache/arrow/issues/41224#issuecomment-2058121190

   IIUC we have the required metadata at this point in the parquet reader:
   
   
https://github.com/apache/arrow/blob/e0f31aa1d4007ebce01cd4bca369d12c4d083162/cpp/src/parquet/file_reader.cc#L117
   
   The metadata however is not forwarded along to the factory function that 
creates the RecordReader(s). For primitive types, I think the readers work 
around this by doing things like 
`PARQUET_THROW_NOT_OK(data_builder_.Reserve(num_decoded * byte_width_));`, but 
that doesn't help much for binary types without a defined byte_width_.
   
   Have to still research more but am hopeful forwarding that metadata can help 
performance and streamline some of the RecordReader code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to