hengfeiyang commented on PR #7821:
URL: 
https://github.com/apache/arrow-datafusion/pull/7821#issuecomment-1762466012

   To test this feature we need to enable an option:
   
   `datafusion.execution.parquet.bloom_filter_enabled` default is `false`
   
   Or just set an ENV:
   ```
   DATAFUSION_EXECUTION_PARQUET_BLOOM_FILTER_ENABLED=true
   ```
   
   This is my test use the `datafusion-cli` in this branch:
   
   ```
   DataFusion CLI v32.0.0
   ❯ create external table tbl stored as parquet location 
'/Users/yanghengfei/code/rust/github.com/zinclabs/openobserve/data/bloomfilter/stream/files/default/logs/traces/';
   0 rows in set. Query took 0.042 seconds.
   
   ❯ select count(*) from tbl;
   +----------+
   | COUNT(*) |
   +----------+
   | 60992352 |
   +----------+
   1 row in set. Query took 0.083 seconds.
   
   ❯ SELECT * FROM tbl where (_timestamp >= 1694908802204000 AND _timestamp < 
1696593531571122) AND trace_id='3c7dbf90d1a66e3faffa344519c3bac0' LIMIT 150;
   +------------------+----------+---------------------+--------+-------+...
   | _timestamp       | duration | end_time            | events | flags | 
trace_id  |...
   +------------------+----------+---------------------+--------+-------+...
   | 1694984600199046 | 2008886  | 1694984602207932061 | []     | 1     | ... 
   | 1694984600199000 | 2009958  | 1694984602208958875 | []     | 1     | ... 
   +------------------+----------+---------------------+--------+-------+...
   2 rows in set. Query took 4.086 seconds.
   
   ❯ set datafusion.execution.parquet.bloom_filter_enabled=true;
   0 rows in set. Query took 0.000 seconds.
   
   ❯ SELECT * FROM tbl where (_timestamp >= 1694908802204000 AND _timestamp < 
1696593531571122) AND trace_id='3c7dbf90d1a66e3faffa344519c3bac0' LIMIT 150;
   +------------------+----------+---------------------+--------+-------+...
   | _timestamp       | duration | end_time            | events | flags | 
trace_id  |...
   +------------------+----------+---------------------+--------+-------+...
   | 1694984600199046 | 2008886  | 1694984602207932061 | []     | 1     | ... 
   | 1694984600199000 | 2009958  | 1694984602208958875 | []     | 1     | ... 
   +------------------+----------+---------------------+--------+-------+...
   2 rows in set. Query took 0.234 seconds.
   ```
   
   
   The information of this directory:
   
   ```shell
   $ cd 
/Users/yanghengfei/code/rust/github.com/zinclabs/openobserve/data/bloomfilter/stream/files/default/logs/traces/
    du -sh . 
   2.8G .
   $ find . -name "*.parquet"|wc -l 
        326
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to