alamb opened a new issue, #4005:
URL: https://github.com/apache/arrow-datafusion/issues/4005

   **Describe the bug**
   DataFusion gets different answers when parquet pushdown is enabled
   
   NOTE that page index filtering is not enabled by default (as we are still 
working on it) so this issue will not likely affect users:
   
   **To Reproduce**
   1. Download data from 
   
[repro.zip](https://github.com/apache/arrow-datafusion/files/9890904/repro.zip)
   2. Run datafusion CLI 
   
   The query run is
   ```sql
   select count(*) from foo where container = 'backend_container_0' OR pod = 
'aqcathnxqsphdhgjtgvxsfyiwbmhlmg';
   ```
   
   **Expected behavior**
   Same answer should be produced with and without page index filtering 
enabled. However, the answers are different
   
   
   Without filter pushdown `39982` rows are produced
   
   ```shell
   $ DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=false datafusion-cli -f 
script.sql
   ...
   +-----------------+
   | COUNT(UInt8(1)) |
   +-----------------+
   | 39982           |
   +-----------------+
   ```
   
   With it enabled:
   
   ```shell
   DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true datafusion-cli -f 
script.sql
   +-----------------+
   | COUNT(UInt8(1)) |
   +-----------------+
   | 0               |
   +-----------------+
   1 row in set. Query took 0.004 seconds.
   ```
   
   **Additional context**
   Found by the test here https://github.com/apache/arrow-datafusion/pull/3976


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to