alamb opened a new issue, #8685:
URL: https://github.com/apache/arrow-datafusion/issues/8685

                 > How do you know the bloom filter isn't being used? Is there 
a reproducer (a parquet file) you can share?
   > 
   > It appears that there is no good way to know if the bloom filter code is 
working via logging or metrics 🤔
   > 
   > 
https://github.com/apache/arrow-datafusion/blob/f39c040ace0b34b0775827907aa01d6bb71cbb14/datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs#L111-L168
   
   I conducted a test locally by writing 200GB of data. When using a Bloom 
filter for queries, I observed that the query only takes 0.1 seconds, whereas 
without using the Bloom filter, the query takes 1 second. If a query takes 1 
second, I can infer that it is not using the Bloom filter because using the 
Bloom filter should yield results within 0.1 seconds.
   
   _Originally posted by @domyway in 
https://github.com/apache/arrow-datafusion/issues/8436#issuecomment-1871854464_
               


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to