XiangpengHao commented on issue #5523: URL: https://github.com/apache/arrow-rs/issues/5523#issuecomment-2429470872
> is that evaluating predicates in ArrowFilter (aka pushed down predicates) is never worse than decoding the columns first and then filtering them with the filter kernel This is an excellent summary of the goal, it also aligns well with my current project. Since I have gone quite far on this, I want to share some of the issues I have encountered: - [ ] very fast row selection, as described in this ticket. - [ ] avoid decoding the predicate columns twice, potentially at the cost of higher memory usage - [ ] adaptive `slice` or `filter` the resulting array, i.e., if the selection is sparse, we should `filter`/`take`, otherwise we should `slice`. - [ ] coalesce the resulting record batches. Since filter is been pushed on to ParquetExec, there won't be a FilterExec therefore no CoalesceBatchExec, which requires the ParquetExec to emit coalesced record batches. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
