alamb commented on PR #16208:
URL: https://github.com/apache/datafusion/pull/16208#issuecomment-2927076431

   I ran q24 locally and did see a small slowdown and did some profiling
   
   As expected filtering is about 30% of the overall execution time of the 
filtering time, about 1/2  goes to creating the output
   ![Screenshot 2025-06-01 at 7 20 04 
AM](https://github.com/user-attachments/assets/9612ad19-c793-4069-b4b2-5e4bb2f01333)
   
   The analysis reveals some more places to potentially improve (also aligned 
with @Dandandan 's suggestions):
   ![Screenshot 2025-06-01 at 7 24 41 
AM](https://github.com/user-attachments/assets/c65618ac-f65c-44ad-9fb1-be706769b1c9)
   
   
   There is also some evidence of reallocation as @Dandandan  mentions: 
https://github.com/apache/datafusion/pull/16208#discussion_r2115799584
   ![Screenshot 2025-06-01 at 7 29 29 
AM](https://github.com/user-attachments/assets/f3f3affe-69a2-4ca7-9efa-d31276bd4d66)
   
   
   Next steps:
   1. I will also try and update the code to avoid allocations when possible 
(specifically, recreate builders)
   3. Optimize predicate creation some more (create it once / slice rather than 
twice)
   
   
   
   
   <details><summary>Details</summary>
   <p>
   
   On this branch: alamb/test_filter_pushdown I do so
   
   ```shell
   ./datafusion-cli-alamb_test_filter_pushdown
    -f q24-many.sql | grep Elapsed
   Elapsed 0.258 seconds.
   Elapsed 0.237 seconds.
   Elapsed 0.220 seconds.
   Elapsed 0.213 seconds.
   Elapsed 0.223 seconds.
   Elapsed 0.225 seconds.
   Elapsed 0.223 seconds.
   Elapsed 0.219 seconds.
   Elapsed 0.223 seconds.
   ```
   
   On main main
   ```shell
   datafusion-cli -f q24-many.sql | grep Elapsed
   Elapsed 0.220 seconds.
   Elapsed 0.217 seconds.
   Elapsed 0.203 seconds.
   Elapsed 0.211 seconds.
   Elapsed 0.216 seconds.
   Elapsed 0.197 seconds.
   Elapsed 0.203 seconds.
   Elapsed 0.199 seconds.
   Elapsed 0.218 seconds.
   ```
   
   </p>
   </details> 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to