2010YOUY01 opened a new issue, #17774:
URL: https://github.com/apache/datafusion/issues/17774

   ### Describe the bug
   
   datafusion-cli is compiled from the latest main commit 5bbdb7eb1
   
   To reproduce, tpch dataset can be generated via 
https://github.com/clflushopt/tpchgen-rs/tree/main/tpchgen-cli, and update the 
following paths
   
   ```
   CREATE EXTERNAL TABLE customer
   STORED AS PARQUET
   LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/customer.parquet';
   
   CREATE EXTERNAL TABLE nation
   STORED AS PARQUET
   LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/nation.parquet';
   
   CREATE EXTERNAL TABLE part
   STORED AS PARQUET
   LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/part.parquet';
   
   CREATE EXTERNAL TABLE region
   STORED AS PARQUET
   LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/region.parquet';
   
   CREATE EXTERNAL TABLE lineitem
   STORED AS PARQUET
   LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/lineitem.parquet';
   
   CREATE EXTERNAL TABLE orders
   STORED AS PARQUET
   LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/orders.parquet';
   
   CREATE EXTERNAL TABLE partsupp
   STORED AS PARQUET
   LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/partsupp.parquet';
   
   CREATE EXTERNAL TABLE supplier
   STORED AS PARQUET
   LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/supplier.parquet';
   
   WITH MonthlySales AS (
     SELECT
       DATE_TRUNC('month', o_orderdate) AS sales_month,
       SUM(l_extendedprice)             AS total_sales
     FROM orders AS o
     JOIN lineitem AS l
       ON o.o_orderkey = l.l_orderkey
     GROUP BY sales_month
   ),
   TopRegions AS (
     SELECT
       n.n_name                 AS region_name,
       SUM(ps.ps_supplycost)    AS region_supply_cost
     FROM partsupp AS ps
     JOIN supplier AS s
       ON ps.ps_suppkey = s.s_suppkey
     JOIN nation AS n
       ON s.s_nationkey = n.n_nationkey
     GROUP BY region_name
     ORDER BY region_supply_cost DESC
     LIMIT 5
   )
   SELECT
     ms.sales_month,
     ms.total_sales,
     tr.region_name,
     tr.region_supply_cost
   FROM MonthlySales AS ms
   LEFT JOIN TopRegions AS tr
     ON tr.region_supply_cost > (
          SELECT AVG(region_supply_cost) FROM TopRegions
        );
   
   Optimizer rule 'eliminate_cross_join' failed
   caused by
   Check optimizer-specific invariants after optimizer rule: 
eliminate_cross_join
   caused by
   Internal error: Failed due to a difference in schemas: original schema: 
DFSchema { inner: Schema { fields: [Field { name: "sales_month", data_type: 
Timestamp(Nanosecond, None), nullable: true, dict_id: 0, dict_is_ordered: 
false, metadata: {} }, Field { name: "total_sales", data_type: Decimal128(25, 
2), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { 
name: "region_name", data_type: Utf8View, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }, Field { name: "region_supply_cost", 
data_type: Decimal128(25, 2), nullable: true, dict_id: 0, dict_is_ordered: 
false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: 
"ms" }), Some(Bare { table: "ms" }), Some(Bare { table: "tr" }), Some(Bare { 
table: "tr" })], functional_dependencies: FunctionalDependencies { deps: 
[FunctionalDependence { source_indices: [0], target_indices: [0, 1], nullable: 
true, mode: Multi }, FunctionalDependence { source_indices: [2], 
target_indices: [2, 3]
 , nullable: true, mode: Multi }] } }, new schema: DFSchema { inner: Schema { 
fields: [Field { name: "sales_month", data_type: Timestamp(Nanosecond, None), 
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { 
name: "total_sales", data_type: Decimal128(25, 2), nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }, Field { name: "region_name", data_type: 
Utf8View, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 
Field { name: "region_supply_cost", data_type: Decimal128(25, 2), nullable: 
true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: 
"avg(topregions.region_supply_cost)", data_type: Decimal128(29, 6), nullable: 
true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, 
field_qualifiers: [Some(Bare { table: "ms" }), Some(Bare { table: "ms" }), 
Some(Bare { table: "tr" }), Some(Bare { table: "tr" }), Some(Bare { table: "tr" 
})], functional_dependencies: FunctionalDependencies { deps: [FunctionalD
 ependence { source_indices: [0], target_indices: [0, 1], nullable: true, mode: 
Multi }, FunctionalDependence { source_indices: [2], target_indices: [2, 3], 
nullable: true, mode: Multi }] } }.
   This issue was likely caused by a bug in DataFusion's code. Please help us 
to resolve this by filing a bug report in our issue tracker: 
https://github.com/apache/datafusion/issues
   ```
   
   ### To Reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   Found by SQLStorm #17698


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to