alamb opened a new issue, #4046:
URL: https://github.com/apache/arrow-datafusion/issues/4046

   **Describe the bug**
   DataFusion generates an error for some predicates when predicate pushdown is 
enabled. 
   
   NOTE This is the same symptom as reported on 
https://github.com/apache/arrow-datafusion/issues/4006 but with a different 
predicate
   
   NOTE that pushdown filtering is not enabled by default (as we are still 
working on it) so this issue will not likely affect users:
   
   **To Reproduce**
   1. Download data from 
[repro.zip](https://github.com/apache/arrow-datafusion/files/9902718/repro.zip)
   2. Run datafusion CLI
   
   The query run is
   ```sql
   select count(*) from foo where request_method != 'GET' OR response_status = 
400 OR service = 'backend';
   ```
   
   I tested is using master at 
https://github.com/apache/arrow-datafusion/commit/35f786bb6ce33cbd58db3e16a46958b58f7676f4,
 which includes the fix for #4006 in 
https://github.com/apache/arrow-datafusion/commit/5cf090a13391501c0ce7707ac7a1e50e18517b79
   
   
   ```shell
   $ git status
   Your branch is up to date with 'apache/master'.
   
   nothing to commit, working tree clean
   $ git rev-parse HEAD
   5cf090a13391501c0ce7707ac7a1e50e18517b79
   ```
   
   **Expected behavior**
   Same answer should be produced with and without row  filtering enabled. 
However, with row filtering an error is produced
   
   ```shell
   datafusion-cli -f script.sql
   +-----------------+
   | COUNT(UInt8(1)) |
   +-----------------+
   | 53819           |
   +-----------------+
   1 row in set. Query took 0.006 seconds.
   ```
   
   With it enabled:
   
   ```shell
   DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true datafusion-cli -f 
script.sql
   ...
   1 row in set. Query took 0.021 seconds.
   ArrowError(ExternalError(Execution("Arrow error: External error: Arrow: 
underlying Arrow error: Compute error: Error evaluating filter predicate: 
Internal(\"Cannot evaluate binary expression NotEq with types UInt16 and 
Utf8\")")))
   ```
   
   **Additional context**
   Found by the test here https://github.com/apache/arrow-datafusion/pull/3976
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to