neilconway opened a new issue, #20438:
URL: https://github.com/apache/datafusion/issues/20438

   ### Describe the bug
   
   In a substrait plan containing correlated subqueries, references to outer 
query fields are parsed incorrectly: we only look at the schema for the 
inner/subquery schema, which produced incorrect column names and types.
   
   ### To Reproduce
   
   For example, consider TPC-H Q21, which is parsed as:
   
   ```
           Projection: SUPPLIER.S_NAME, count(Int64(1)) AS NUMWAIT
              Limit: skip=0, fetch=100
                Sort: count(Int64(1)) DESC NULLS FIRST, SUPPLIER.S_NAME ASC 
NULLS LAST
                  Aggregate: groupBy=[[SUPPLIER.S_NAME]], 
aggr=[[count(Int64(1))]]
                    Projection: SUPPLIER.S_NAME
                      Filter: SUPPLIER.S_SUPPKEY = LINEITEM.L_SUPPKEY AND 
ORDERS.O_ORDERKEY = LINEITEM.L_ORDERKEY AND ORDERS.O_ORDERSTATUS = Utf8("F") 
AND LINEITEM.L_RECEIPTDATE > LINEITEM.L_COMMITDATE AND EXISTS (<subquery>) AND 
NOT EXISTS (<subquery>) AND SUPPLIER.S_NATIONKEY = NATION.N_NATIONKEY AND 
NATION.N_NAME = U\
    tf8("SAUDI ARABIA")
                        Subquery:
                          Filter: LINEITEM.L_ORDERKEY = LINEITEM.L_TAX AND 
LINEITEM.L_SUPPKEY != LINEITEM.L_LINESTATUS
                            TableScan: LINEITEM
                        Subquery:
                          Filter: LINEITEM.L_ORDERKEY = LINEITEM.L_TAX AND 
LINEITEM.L_SUPPKEY != LINEITEM.L_LINESTATUS AND LINEITEM.L_RECEIPTDATE > 
LINEITEM.L_COMMITDATE
                            TableScan: LINEITEM
                        Cross Join:
                          Cross Join:
                            Cross Join:
                              TableScan: SUPPLIER
                              TableScan: LINEITEM
                            TableScan: ORDERS
                          TableScan: NATION
   ```
   
   Note that in the subquery, the filter has the clause `LINEITEM.L_SUPPKEY != 
LINEITEM.L_LINESTATUS`. This is not what Q21 contains; and in fact the types of 
those two columns (Int64 and Utf8) are not even compatible, although we don't 
currently reject this.
   
   ### Expected behavior
   
   Parse references to outer query fields correctly.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to