alamb commented on issue #7456: URL: https://github.com/apache/arrow-rs/issues/7456#issuecomment-2887030974
@zhuqi-lucas and I have been working on various strategies / structures to make the filtering faster. I believe we now have evidence enough to proceed with a more sophisticated implementation Specifically, - @zhuqi-lucas has shown the hybrid Filter/RowSelection approach works well in https://github.com/apache/arrow-rs/pull/7454 - I have shown the idea of reusing actual filter results works well in https://github.com/apache/arrow-rs/pull/7513 (though I still need to work out how to limit memory usage more, potentially adaptively) Thus my next steps will be: 1. Create a few refactoring PRs that gets the predicate code into shape Perhaps then @zhuqi-lucas can help port the hybrid Filter/RowSelection to the `ReadPlan` to get better performance without changing any public interfaces -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org