alamb commented on issue #9404:
URL: 
https://github.com/apache/arrow-datafusion/issues/9404#issuecomment-1986804684

   I think the numbers are good to publish.
   
   In terms of the slowdown I agree it is likely something real (and we are 
tracking something similar in 
https://github.com/apache/arrow-datafusion/issues/8836)
   
   Perhaps there is per-file or per-partition overhead that has increased in 
34.0.0 that we haven't gotten to the bottom of. @Ted-Jiang  I wonder if this 
could be related to caching the file metadata -- for example I wonder if using 
something like  
https://docs.rs/datafusion/latest/datafusion/execution/cache/cache_manager/struct.CacheManager.html
 would improve the performance as it would avoid some small IOs at the start of 
the query
   
   It sort of feels like cheating however to cache the metadata, though I 
suppose it would be allowed 🤔 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to