jorgecarleitao commented on a change in pull request #1556:
URL: https://github.com/apache/arrow-datafusion/pull/1556#discussion_r785396533
##########
File path: datafusion/src/datasource/file_format/parquet.rs
##########
@@ -238,29 +246,30 @@ fn summarize_min_max(
}
}
}
- _ => {}
+ PhysicalType::FixedLenByteArray(_) => {
+ // type not supported yet
+ }
}
+
+ Ok(())
}
/// Read and parse the schema of the Parquet file at location `path`
fn fetch_schema(object_reader: Arc<dyn ObjectReader>) -> Result<Schema> {
- let obj_reader = ChunkObjectReader(object_reader);
- let file_reader = Arc::new(SerializedFileReader::new(obj_reader)?);
- let mut arrow_reader = ParquetFileArrowReader::new(file_reader);
- let schema = arrow_reader.get_schema()?;
-
+ let mut reader = object_reader.sync_reader()?;
+ let meta_data = read_metadata(&mut reader)?;
+ let schema = get_schema(&meta_data)?;
Ok(schema)
}
/// Read and parse the statistics of the Parquet file at location `path`
fn fetch_statistics(object_reader: Arc<dyn ObjectReader>) ->
Result<Statistics> {
- let obj_reader = ChunkObjectReader(object_reader);
- let file_reader = Arc::new(SerializedFileReader::new(obj_reader)?);
- let mut arrow_reader = ParquetFileArrowReader::new(file_reader);
- let schema = arrow_reader.get_schema()?;
+ let mut reader = object_reader.sync_reader()?;
Review comment:
```suggestion
let mut reader = object_reader.sync_reader()?;
```
IMO this is the culprit of the perf regressions since this does not buffer
anything. Filled under #1583
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]