alamb opened a new issue, #4005: URL: https://github.com/apache/arrow-datafusion/issues/4005
**Describe the bug** DataFusion gets different answers when parquet pushdown is enabled NOTE that page index filtering is not enabled by default (as we are still working on it) so this issue will not likely affect users: **To Reproduce** 1. Download data from [repro.zip](https://github.com/apache/arrow-datafusion/files/9890904/repro.zip) 2. Run datafusion CLI The query run is ```sql select count(*) from foo where container = 'backend_container_0' OR pod = 'aqcathnxqsphdhgjtgvxsfyiwbmhlmg'; ``` **Expected behavior** Same answer should be produced with and without page index filtering enabled. However, the answers are different Without filter pushdown `39982` rows are produced ```shell $ DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=false datafusion-cli -f script.sql ... +-----------------+ | COUNT(UInt8(1)) | +-----------------+ | 39982 | +-----------------+ ``` With it enabled: ```shell DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true datafusion-cli -f script.sql +-----------------+ | COUNT(UInt8(1)) | +-----------------+ | 0 | +-----------------+ 1 row in set. Query took 0.004 seconds. ``` **Additional context** Found by the test here https://github.com/apache/arrow-datafusion/pull/3976 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
