alamb commented on issue #11442:
URL: https://github.com/apache/datafusion/issues/11442#issuecomment-2228160735

   > Do you have a list in mind the area that is worth for performance 
improvement? Somethings I known that are still active in my head
   
   In my mind, here are somre "obvious" performance projects (the ones I have 
the most confidence that would make a meaningful difference on ClickBench or 
TPCH queries) are as follows (I can maybe put this in the documentation)
   
   ## Integrate StringView into Parquet / Filtering / Grouping
   * #10918 
   
   @XiangpengHao is doing this as his summer project and doing an amazing job. 
I also think this is a great example of the the level of effort required to 
drive one of these performance projects. It requires implementing the features, 
then analyzing / profiling, identifying the bottlenecks, and then making PRs to 
remove the bottlenecks. ee #10918  and 
https://github.com/apache/arrow-rs/issues/5374 have the entire list. Some of my 
favorites: 
   * https://github.com/apache/arrow-rs/pull/6009
   * https://github.com/apache/arrow-rs/issues/6034
   
   
   
   **What**: Use newly added `StringView` from arrow to improve performance (by 
avoiding variable length/string data copies)
   **Why**: This For queries that deal with string data in ClickBench or TPCH a 
large amount of time is spent in parquet decoding as well as filtering and 
grouping. 
   **What is left**: See #10918  and 
https://github.com/apache/arrow-rs/issues/5374
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to