rdettai commented on issue #1363: URL: https://github.com/apache/arrow-datafusion/issues/1363#issuecomment-980013409
One possible reason might be that #1010 introduces the use of the `ObjectStore`: https://github.com/apache/arrow-datafusion/blob/414c826bf06fd22e0bb52edbb497791b5fe558e0/datafusion/src/physical_plan/file_format/parquet.rs#L408-L411 The abstraction requires the use of **dynamic dispatch on the reader** (`fn sync_chunk_reader(&self,start: u64, length: usize) -> Result<Box<dyn Read + Send + Sync>>`), which can indeed reduce performances if `read()` is called a lot. Actually, now that I'm thinking, some old memories are coming back: if I remember correctly, 2 years ago when I was first playing with the parquet reader I noticed that something like this was happening. `read()` was called in a way that it was often getting only 1 byte at a time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org