alamb commented on PR #6921:
URL: https://github.com/apache/arrow-rs/pull/6921#issuecomment-2718792433

   @XiangpengHao  and I just had a nice discussion about this ticket and next 
steps.
   
   One thing that he noted is that reviewing this PR (and understanding its 
implications) is tricky as it requires a lot of context. For example, there are 
two subsets of columns
   * Predicate columns
   * Projection columns
   
   And those columns can be disjoint sets. This PR caches the intersection of 
those two columns. Also the design is that this PR  doesn't cache every page 
(only cache 2 pages) to avoid increasing memory consumption
   
   In order to move forward I think the ideas are:
   
   Next steps:
   1. @XiangpengHao  will write up the current state of the affairs / document 
the existing code better
   1. We then Rebase the PR against main
   2. Rerun the clickbench / tpch DataFusion benchmarks again
   
   @XiangpengHao  mentioned that while ClickBench Q23 gets 2x faster with 
pushdown enabled and this PR, it is actually even faster when pushdown is 
enabled without this pR (aka this PR regresses the pushdown performance)
    
   Thus we will also thought it would be valuable to
   1. Put this new behavior behind a option that can be disabled in case we 
encounter issues rolling it out
   3. Figure out how to get performance back for Q23 (maybe not needed for this 
PR)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to