alamb commented on issue #2962:
URL: 
https://github.com/apache/arrow-datafusion/issues/2962#issuecomment-1197978724

   >  Maybe those two types of pruning should be part of the parquet arrow 
project. 
   
   I suspect additional filter pushdown will require changes in both the 
parquet reader and then datafusion
   
   I think there is work underway by @Ted-Jiang @liukun4515  @thinkharderdev 
and @tustvold  to implement "Page Pruning" which I think may be what you are 
referring to here (it allows the parquet reader to skip materializing/decoding 
positions based on evaluating the predicates) -- the work is partially 
described in https://github.com/apache/arrow-rs/issues/1191
   
   In terms of using parquet bloom filters, I suspect that would also need work 
in parquet and datafusion, and I don't know of any efforts underway to do so. 
@shanisolomon added initial support to expose the bloom filter metadata in 
https://github.com/apache/arrow-rs/pull/1309 and [follow 
on](https://github.com/apache/arrow-rs/pulls?q=is%3Apr+bloom+is%3Aclosed) PRs, 
but I believe they then implemented the Bloom Filtering in a closed source 
project (cc @zeevm who might know more)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to