[GitHub] [arrow-datafusion] tustvold commented on issue #3214: Don't scan first column on empty projection

GitBox Wed, 31 Aug 2022 07:45:17 -0700


tustvold commented on issue #3214:
URL: 
https://github.com/apache/arrow-datafusion/issues/3214#issuecomment-1233034720


   I think there are two different optimisations being discussed here:
   
   * Skip interacting with the file based on catalog statistics if available
   * Remove projection "hack" and delegate to file readers
   
   Parquet has supported the latter since 
https://github.com/apache/arrow-rs/pull/1560, and CSV/JSON will support it once 
https://github.com/apache/arrow-rs/pull/2604 is released. I think it should be 
then be possible to remove the workaround, as it will be no longer necessary.
   
   As to the former, I think it should be fairly straightforward to implement a 
physical optimiser pass that uses statistics to simplify counts into 
projections based on statistics. I had thought we had already implemented this 
tbh... :thinking: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] tustvold commented on issue #3214: Don't scan first column on empty projection

Reply via email to