alamb commented on issue #7456: URL: https://github.com/apache/arrow-rs/issues/7456#issuecomment-2898114762
Status update The high level plan to improve performance has two parts: 1. adaptive iteration / representation of filter results (basically https://github.com/apache/arrow-rs/issues/5523) 3. Caching the results of filtering when the column is used in the final projection (basically https://github.com/apache/arrow-rs/issues/4864) My main concern about resuing the result of filtering is memory usage and I think it is important to keep the usage to a minimum -- the current APIs (`filter` and `concat` kernels) require a 2x memory overhead so I think it is important to reduce that as well as add some way to limit memory consumption when the filtering result We also need to implement more sophisticated logic when there are multiple predicates. My next steps are: 1. Try and factor the adaptive representation of filter results with @zhuqi-lucas 1. Explore ways to reduce the memory overhead with caching results (this should help other APIs too). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org