Re: [I] Adaptive Parquet Predicate Pushdown [arrow-rs]

via GitHub Tue, 22 Oct 2024 07:38:05 -0700


XiangpengHao commented on issue #5523:
URL: https://github.com/apache/arrow-rs/issues/5523#issuecomment-2429470872


   > is that evaluating predicates in ArrowFilter (aka pushed down predicates) 
is never worse than decoding the columns first and then filtering them with the 
filter kernel
   
   This is an excellent summary of the goal, it also aligns well with my 
current project. 
   
   Since I have gone quite far on this, I want to share some of the issues I 
have encountered:
   - [ ] very fast row selection, as described in this ticket.
   - [ ] avoid decoding the predicate columns twice, potentially at the cost of 
higher memory usage
   - [ ] adaptive `slice` or `filter` the resulting array, i.e., if the 
selection is sparse, we should `filter`/`take`, otherwise we should `slice`.
   - [ ] coalesce the resulting record batches. Since filter is been pushed on 
to ParquetExec, there won't be a FilterExec therefore no CoalesceBatchExec, 
which requires the ParquetExec to emit coalesced record batches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Adaptive Parquet Predicate Pushdown [arrow-rs]

Reply via email to