Re: [PR] Read only enough bytes to infer Arrow IPC file schema via stream [arrow-datafusion]

via GitHub Wed, 01 Nov 2023 13:02:42 -0700


tustvold commented on code in PR #7962:
URL: https://github.com/apache/arrow-datafusion/pull/7962#discussion_r1379276288



##########
datafusion/core/src/datasource/file_format/arrow.rs:
##########
@@ -99,7 +102,177 @@ impl FileFormat for ArrowFormat {
     }
 }
 
-fn read_arrow_schema_from_reader<R: Read + Seek>(reader: R) -> 
Result<SchemaRef> {
-    let reader = FileReader::try_new(reader, None)?;
-    Ok(reader.schema())
+const ARROW_MAGIC: [u8; 6] = [b'A', b'R', b'R', b'O', b'W', b'1'];

Review Comment:
   > reverting to the old method of reading the entire stream just to decode 
the schema.
   
   The idea would be to do something similar to what we do to read the parquet 
footer, I provided a few more details on the linked ticket. The trick is to 
perform ranged reads



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Read only enough bytes to infer Arrow IPC file schema via stream [arrow-datafusion]

Reply via email to