alamb commented on issue #9404: URL: https://github.com/apache/arrow-datafusion/issues/9404#issuecomment-1986804684
I think the numbers are good to publish. In terms of the slowdown I agree it is likely something real (and we are tracking something similar in https://github.com/apache/arrow-datafusion/issues/8836) Perhaps there is per-file or per-partition overhead that has increased in 34.0.0 that we haven't gotten to the bottom of. @Ted-Jiang I wonder if this could be related to caching the file metadata -- for example I wonder if using something like https://docs.rs/datafusion/latest/datafusion/execution/cache/cache_manager/struct.CacheManager.html would improve the performance as it would avoid some small IOs at the start of the query It sort of feels like cheating however to cache the metadata, though I suppose it would be allowed 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
