rdettai commented on issue #1363:
URL: 
https://github.com/apache/arrow-datafusion/issues/1363#issuecomment-980013409


   One possible reason might be that #1010 introduces the use of the 
`ObjectStore`: 
https://github.com/apache/arrow-datafusion/blob/414c826bf06fd22e0bb52edbb497791b5fe558e0/datafusion/src/physical_plan/file_format/parquet.rs#L408-L411
 
   
   The abstraction requires the use of **dynamic dispatch on the reader** (`fn 
sync_chunk_reader(&self,start: u64, length: usize) -> Result<Box<dyn Read + 
Send + Sync>>`), which can indeed reduce performances if `read()` is called a 
lot. Actually, now that I'm thinking, some old memories are coming back: if I 
remember correctly, 2 years ago when I was first playing with the parquet 
reader I noticed that something like this was happening. `read()` was called in 
a way that it was often getting only 1 byte at a time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to