andygrove opened a new issue, #4071:
URL: https://github.com/apache/arrow-datafusion/issues/4071

   **Describe the bug**
   
   I am comparing Spark and DataFusion behavior against a Parquet file with two 
versions of a filter predicate that should be equivalent:
   
   - `l_discount between 0.05 and 0.07`
   - `l_discount between 0.06-0.01 and 0.06+0.01`
   
   ## Spark
   
   scala> 
spark.read.parquet("/mnt/bigdata/tpch/sf1-parquet/lineitem").createTempView("lineitem")
                                                                                
   
   scala> spark.sql("SELECT count(*) from lineitem where l_discount between 
0.05 and 0.07").show
   +--------+                                                                   
   
   |count(1)|
   +--------+
   |16361562|
   +--------+
   
   scala> spark.sql("SELECT count(*) from lineitem where l_discount between 
0.06-0.01 and 0.06+0.01").show
   +--------+
   |count(1)|
   +--------+
   |16361562|
   +--------+
   
   ## DataFusion
   
   ❯ create external table lineitem stored as parquet location 
'/mnt/bigdata/tpch/sf1-parquet/lineitem';
   0 rows in set. Query took 0.015 seconds.
   
   ❯ select count(*) from lineitem where l_discount between 0.05 and 0.07;
   +-----------------+
   | COUNT(UInt8(1)) |
   +-----------------+
   | 16361562        |
   +-----------------+
   1 row in set. Query took 0.487 seconds.
   
   ❯ select count(*) from lineitem where l_discount between 0.06-0.01 and 
0.06+0.01;
   +-----------------+
   | COUNT(UInt8(1)) |
   +-----------------+
   | 10908630        |
   +-----------------+
   1 row in set. Query took 0.394 seconds.
   
   So `between 0.05 and 0.07` is consistent between Spark and DataFusion, but 
`between 0.06-0.01 and 0.06+0.01` is not.
   
   **To Reproduce**
   See above
   
   **Expected behavior**
   
   
   **Additional context**
   Discovered as part of https://github.com/apache/arrow-datafusion/issues/4024
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to