alamb commented on issue #7456:
URL: https://github.com/apache/arrow-rs/issues/7456#issuecomment-2898114762

   Status update
   
   The high level plan to improve performance has two parts:
   1.  adaptive iteration / representation of filter results (basically 
https://github.com/apache/arrow-rs/issues/5523)
   3. Caching the results of filtering when the column is used in the final 
projection (basically https://github.com/apache/arrow-rs/issues/4864)
   
   My main concern about resuing the result of filtering is memory usage and I 
think it is important to keep the usage to a minimum -- the current APIs 
(`filter` and `concat` kernels) require a 2x memory overhead so I think it is 
important to reduce that as well as add some way to limit memory consumption 
when the filtering result
   
   We also need to implement more sophisticated logic when there are multiple 
predicates.
   
   My next steps are:
   1. Try and factor the adaptive representation of filter results with 
@zhuqi-lucas 
   1. Explore ways to reduce the memory overhead with caching results (this 
should help other APIs too). 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to