alamb opened a new issue, #4006:
URL: https://github.com/apache/arrow-datafusion/issues/4006

   **Describe the bug**
   DataFusion gets different answers when parquet pushdown is enabled
   
   NOTE that pushdown filtering is not enabled by default (as we are still 
working on it) so this issue will not likely affect users:
   
   **To Reproduce**
   1. Download data from 
   
[repro.zip](https://github.com/apache/arrow-datafusion/files/9890904/repro.zip)
   2. Run datafusion CLI 
   
   The query run is
   ```sql
   select count(*) from foo where request_duration_ns > 791684060 OR 
client_addr NOT in ('213.120.214.213');
   ```
   
   **Expected behavior**
   Same answer should be produced with and without row  filtering enabled. 
However, with row filtering an error is produced
   
   ```shell
   datafusion-cli -f script.sql 
   +-----------------+
   | COUNT(UInt8(1)) |
   +-----------------+
   | 53819           |
   +-----------------+
   1 row in set. Query took 0.006 seconds.
   ```
   
   With it enabled:
   
   ```shell
   DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true datafusion-cli -f 
script.sql 
   ...
   1 row in set. Query took 0.002 seconds.
   ArrowError(ExternalError(Execution("Arrow error: External error: Arrow: 
underlying Arrow error: Compute error: Error evaluating filter predicate: 
Internal(\"Cannot evaluate binary expression Gt with types Utf8 and Int32\")")))
   ```
   
   **Additional context**
   Found by the test here https://github.com/apache/arrow-datafusion/pull/3976


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to