[GitHub] [arrow-datafusion] alamb opened a new issue, #3147: Push additional parquet filtering into the parquet scan [EPIC]

GitBox Mon, 15 Aug 2022 05:05:47 -0700


alamb opened a new issue, #3147:
URL: https://github.com/apache/arrow-datafusion/issues/3147


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   The more filtering that can be pushed to the parquet reading, the faster a 
query will run in general as less work is needed to decode and process data 
that will eventually be filtered from the plan
   
   There are several ongoing workstreams that will eventually lead to pushing 
down substantial additional filtering into the parquet scan that should 
substantially increase performance for datafusion. I wanted to capture them 
here to provide more visibility
   
   cc @Ted-Jiang @tustvold @thinkharderdev 
   
   
   **Describe the solution you'd like**
   Here are some of the tasks I have collected. There are likely more -- please 
add them (either directly or via comments) 
   - [ ] https://github.com/apache/arrow-datafusion/pull/2677
   - [ ] https://github.com/apache/arrow-rs/issues/1191
   - [ ] https://github.com/apache/arrow-rs/issues/2270
   - [ ] https://github.com/apache/arrow-datafusion/issues/847
   - [ ] Write a blog post on the topic
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb opened a new issue, #3147: Push additional parquet filtering into the parquet scan [EPIC]

Reply via email to