tustvold commented on issue #3214: URL: https://github.com/apache/arrow-datafusion/issues/3214#issuecomment-1233034720
I think there are two different optimisations being discussed here: * Skip interacting with the file based on catalog statistics if available * Remove projection "hack" and delegate to file readers Parquet has supported the latter since https://github.com/apache/arrow-rs/pull/1560, and CSV/JSON will support it once https://github.com/apache/arrow-rs/pull/2604 is released. I think it should be then be possible to remove the workaround, as it will be no longer necessary. As to the former, I think it should be fairly straightforward to implement a physical optimiser pass that uses statistics to simplify counts into projections based on statistics. I had thought we had already implemented this tbh... :thinking: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
