pabadrubio opened a new issue, #22615:
URL: https://github.com/apache/datafusion/issues/22615

   ### Describe the bug
   
   `PushDownLeafProjections` throws an error when it tries to push down an 
expression through an `Unnest` node. For example, given this table:
   ```
   CREATE TABLE struct_and_list_table
   AS VALUES
       (struct(1, 2), [10, 20, 30]),
       (struct(3, 4), [40, 50]);
   ```
   
   When trying to run this query:
   ```
   SELECT sum(get_field(s, 'c0'))
   FROM (SELECT s, unnest(arr)
         FROM (SELECT column1 AS s, column2 AS arr
               FROM struct_and_list_table));
   ```
   the optimizer throws this error:
   
   ```
   Internal error: Assertion failed: expr.is_empty(): Unnest(Unnest { input: 
Projection(Projection { expr: [Alias(Alias { expr: Column(Column { relation: 
Some(Bare { table: "struct_and_list_table" }), name: "column1" }), relation: 
None, name: "s", metadata: None }), Alias(Alias { expr: Column(Column { 
relation: Some(Bare { table: "struct_and_list_table" }), name: "column2" }), 
relation: None, name: "__unnest_placeholder(arr)", metadata: None })], input: 
TableScan(TableScan { table_name: Bare { table: "struct_and_list_table" }, 
source: "...", projection: Some([0, 1]), projected_schema: DFSchema { inner: 
Schema { fields: [Field { name: "column1", data_type: Struct([Field { name: 
"c0", data_type: Int64, nullable: true }, Field { name: "c1", data_type: Int64, 
nullable: true }]), nullable: true }, Field { name: "column2", data_type: 
List(Field { data_type: Int64, nullable: true }), nullable: true }], metadata: 
{} }, field_qualifiers: [Some(Bare { table: "struct_and_list_table" }), Some(Ba
 re { table: "struct_and_list_table" })], functional_dependencies: 
FunctionalDependencies { deps: [] } }, filters: [], fetch: None, .. }), schema: 
DFSchema { inner: Schema { fields: [Field { name: "s", data_type: Struct([Field 
{ name: "c0", data_type: Int64, nullable: true }, Field { name: "c1", 
data_type: Int64, nullable: true }]), nullable: true }, Field { name: 
"__unnest_placeholder(arr)", data_type: List(Field { data_type: Int64, 
nullable: true }), nullable: true }], metadata: {} }, field_qualifiers: [None, 
None], functional_dependencies: FunctionalDependencies { deps: [] } } }), 
exec_columns: [Column { relation: None, name: "__unnest_placeholder(arr)" }], 
list_type_columns: [(1, ColumnUnnestList { output_column: Column { relation: 
None, name: "__unnest_placeholder(arr,depth=1)" }, depth: 1 })], 
struct_type_columns: [], dependency_indices: [0, 1], schema: DFSchema { inner: 
Schema { fields: [Field { name: "s", data_type: Struct([Field { name: "c0", 
data_type: Int64, nullable: true
  }, Field { name: "c1", data_type: Int64, nullable: true }]), nullable: true 
}, Field { name: "__unnest_placeholder(arr,depth=1)", data_type: Int64, 
nullable: true }], metadata: {} }, field_qualifiers: [None, None], 
functional_dependencies: FunctionalDependencies { deps: [] } }, options: 
UnnestOptions { preserve_nulls: false, recursions: [RecursionUnnestOption { 
input_column: Column { relation: None, name: "__unnest_placeholder(arr)" }, 
output_column: Column { relation: None, name: 
"__unnest_placeholder(arr,depth=1)" }, depth: 1 }] } }) should have no exprs, 
got [Column(Column { relation: None, name: "__unnest_placeholder(arr)" })].
   ```
   
   The problem is in this line in function `try_push_into_inputs` in 
`extract_leaf_expressions.rs`:
   ```
   let new_node = node.with_new_exprs(node.expressions(), new_inputs)?;
   ```
   
   This doesn't work with `Unnest` nodes, as these nodes return the columns to 
unnest in the `expression()` functions, but don't accept them in the 
`with_new_expressions`, contrary to other nodes.
   
   But there is a secondary problem that would appear when the leaf expression 
refers to the unnested column, like in this case:
   
   ```
   CREATE TABLE list_struct_table
   AS VALUES
       ([struct(1, 'a'), struct(2, 'b')]),
       ([struct(3, 'c')]);
   
   SELECT get_field(unnest(column1), 'c0'), get_field(unnest(column1), 'c1')
   FROM list_struct_table;
   ```
   
   The problem is in the logic that `try_push_into_inputs` uses to verify that 
an expression can be pushed through a node. It needs to make sure that the 
expressions to push through a node are in the input of that node to allow it to 
be pushed through. The problem is that for Unnest, the columns being unnest are 
going to have the same name as the input columns. Because the check compares 
just the names, it assumes this means that the columns referred in the 
expression are in the `Unnest` input and the check passes. But these are not 
the columns that the expression refers to. The expression is referring to the 
unnested columns, not to the input list columns, and therefore we can't push 
the expressions.
   
   ### To Reproduce
   
   First case:
   
   ```
   CREATE TABLE struct_and_list_table
   AS VALUES
       (struct(1, 2), [10, 20, 30]),
       (struct(3, 4), [40, 50]);
   
   SELECT sum(get_field(s, 'c0'))
   FROM (SELECT s, unnest(arr)
         FROM (SELECT column1 AS s, column2 AS arr
               FROM struct_and_list_table));
   ```
   
   Second case:
   ```
   CREATE TABLE list_struct_table
   AS VALUES
       ([struct(1, 'a'), struct(2, 'b')]),
       ([struct(3, 'c')]);
   
   query IT
   SELECT get_field(unnest(column1), 'c0'), get_field(unnest(column1), 'c1')
   FROM list_struct_table;
   ```
   
   Both queries fail with:
   
   ```
   Internal error: Assertion failed: expr.is_empty():
   ```
   
   
   ### Expected behavior
   
   Both queries work.
   First query result:
   ```
   9
   ```
   
   Second query result:
   ```
   1 a
   2 b
   3 c
   ```
   
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to