Re: [PR] Read only enough bytes to infer Arrow IPC file schema via stream [arrow-datafusion]

via GitHub Wed, 01 Nov 2023 10:17:57 -0700


tustvold commented on PR #7962:
URL: 
https://github.com/apache/arrow-datafusion/pull/7962#issuecomment-1789347016


   FWIW for consistency we might want to do something closer to what we do for 
parquet where:
   
   * We have an estimate of the size of the footer which we fetch
   * We read the actual footer size
   * We then fetch any extra data needed
   * Once decoded the footer provides information on the schema and where the 
data blocks are located
   
   This PR instead appears to read the first RecordBatch, whilst I _think_ this 
should work (provided the file contains data), the more standard approach might 
be to read the footer.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Read only enough bytes to infer Arrow IPC file schema via stream [arrow-datafusion]

Reply via email to