westonpace commented on issue #33683:
URL: https://github.com/apache/arrow/issues/33683#issuecomment-1544003325

   Support for reading bloom filters from parquet files into memory was added 
in 12.0.0.  There is an open issue for using this feature to do pushdown 
filtering here: https://github.com/apache/arrow/issues/27277
   
   The datasets feature was already doing some pushdown using the parquet file 
statistics.  That issue asks to also use the bloom filter for pushdown 
filtering for datasets.
   
   The parquet reader itself hasn't done pushdown in the past, but I'd be 
generally in favor of moving the pushdown filtering out of the datasets layer 
and into the file reader layer itself if someone was motivated to do the work.  
That would be more complex than just adding bloom filter filtering support to 
the datasets layer though because you'd have to figure out how to formulate 
filter expressions (you could add a dependency on arrow expressions but I'm not 
sure if that makes sense in the parquet layer).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to