osawyerr opened a new issue, #6794:
URL: https://github.com/apache/arrow-datafusion/issues/6794
### Describe the bug
Datafusion gives incorrect results when running TPC-H Query 8 with parquet
files.
### To Reproduce
1. Generate TPC-H parquet files for scale factor 10
2. Open datafusion-cli and create external tables pointing to files
```sql
create external table lineitem stored as parquet location
'/path/to/lineitem/lineitem_1687987398_default/';
create external table customer stored as parquet location
'/path/to/customer/customer_1687987384_default/';
create external table nation stored as parquet location
'/path/to/nation/nation_1687988005_default/';
create external table orders stored as parquet location
'/path/to/orders/orders_1687988005_default/';
create external table region stored as parquet location
'/path/to/region/region_1687988223_default/';
create external table supplier stored as parquet location
'/path/to/supplier/supplier_1687988223_default/';
create external table part stored as parquet location
'/path/to/part/part_1687988133_default/';
```
3. Run TPC-H Query 8
```sql
select
o_year, sum(case when nation = 'BRAZIL' then volume else
0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o_orderdate) as o_year,
l_extendedprice * (
1
- l_discount) as volume,
n2.n_name as nation
from part, supplier, lineitem, orders, customer, nation n1, nation n2,
region
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'AMERICA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01' and date '1996-12-31'
and p_type = 'ECONOMY ANODIZED STEEL'
) as all_nations
group by o_year
order by o_year;
```
4. Incorrect results displayed below
```
+--------+-------------------------------------------+
| o_year | mkt_share |
+--------+-------------------------------------------+
| 1995.0 | -0.00000000000011380044067220119495060732 |
| 1996.0 | 0.00000000000019588288500717383285218261 |
+--------+-------------------------------------------+
2 rows in set. Query took 1.131 seconds.
```
### Expected behavior
The correct results should be:
1. From Postgres:
```
o_year | mkt_share
--------+------------------------
1995 | 0.03882014251433219622
1996 | 0.03948968749183991638
(2 rows)
```
2. From DuckDb (with same parquet files):
```
┌────────┬─────────────────────┐
│ o_year │ mkt_share │
│ int64 │ double │
├────────┼─────────────────────┤
│ 1995 │ 0.0388201425143322 │
│ 1996 │ 0.03948968749183992 │
└────────┴─────────────────────┘
Run Time (s): real 1.328 user 8.095730 sys 0.358842
```
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]