pabadrubio opened a new issue, #22615:
URL: https://github.com/apache/datafusion/issues/22615
### Describe the bug
`PushDownLeafProjections` throws an error when it tries to push down an
expression through an `Unnest` node. For example, given this table:
```
CREATE TABLE struct_and_list_table
AS VALUES
(struct(1, 2), [10, 20, 30]),
(struct(3, 4), [40, 50]);
```
When trying to run this query:
```
SELECT sum(get_field(s, 'c0'))
FROM (SELECT s, unnest(arr)
FROM (SELECT column1 AS s, column2 AS arr
FROM struct_and_list_table));
```
the optimizer throws this error:
```
Internal error: Assertion failed: expr.is_empty(): Unnest(Unnest { input:
Projection(Projection { expr: [Alias(Alias { expr: Column(Column { relation:
Some(Bare { table: "struct_and_list_table" }), name: "column1" }), relation:
None, name: "s", metadata: None }), Alias(Alias { expr: Column(Column {
relation: Some(Bare { table: "struct_and_list_table" }), name: "column2" }),
relation: None, name: "__unnest_placeholder(arr)", metadata: None })], input:
TableScan(TableScan { table_name: Bare { table: "struct_and_list_table" },
source: "...", projection: Some([0, 1]), projected_schema: DFSchema { inner:
Schema { fields: [Field { name: "column1", data_type: Struct([Field { name:
"c0", data_type: Int64, nullable: true }, Field { name: "c1", data_type: Int64,
nullable: true }]), nullable: true }, Field { name: "column2", data_type:
List(Field { data_type: Int64, nullable: true }), nullable: true }], metadata:
{} }, field_qualifiers: [Some(Bare { table: "struct_and_list_table" }), Some(Ba
re { table: "struct_and_list_table" })], functional_dependencies:
FunctionalDependencies { deps: [] } }, filters: [], fetch: None, .. }), schema:
DFSchema { inner: Schema { fields: [Field { name: "s", data_type: Struct([Field
{ name: "c0", data_type: Int64, nullable: true }, Field { name: "c1",
data_type: Int64, nullable: true }]), nullable: true }, Field { name:
"__unnest_placeholder(arr)", data_type: List(Field { data_type: Int64,
nullable: true }), nullable: true }], metadata: {} }, field_qualifiers: [None,
None], functional_dependencies: FunctionalDependencies { deps: [] } } }),
exec_columns: [Column { relation: None, name: "__unnest_placeholder(arr)" }],
list_type_columns: [(1, ColumnUnnestList { output_column: Column { relation:
None, name: "__unnest_placeholder(arr,depth=1)" }, depth: 1 })],
struct_type_columns: [], dependency_indices: [0, 1], schema: DFSchema { inner:
Schema { fields: [Field { name: "s", data_type: Struct([Field { name: "c0",
data_type: Int64, nullable: true
}, Field { name: "c1", data_type: Int64, nullable: true }]), nullable: true
}, Field { name: "__unnest_placeholder(arr,depth=1)", data_type: Int64,
nullable: true }], metadata: {} }, field_qualifiers: [None, None],
functional_dependencies: FunctionalDependencies { deps: [] } }, options:
UnnestOptions { preserve_nulls: false, recursions: [RecursionUnnestOption {
input_column: Column { relation: None, name: "__unnest_placeholder(arr)" },
output_column: Column { relation: None, name:
"__unnest_placeholder(arr,depth=1)" }, depth: 1 }] } }) should have no exprs,
got [Column(Column { relation: None, name: "__unnest_placeholder(arr)" })].
```
The problem is in this line in function `try_push_into_inputs` in
`extract_leaf_expressions.rs`:
```
let new_node = node.with_new_exprs(node.expressions(), new_inputs)?;
```
This doesn't work with `Unnest` nodes, as these nodes return the columns to
unnest in the `expression()` functions, but don't accept them in the
`with_new_expressions`, contrary to other nodes.
But there is a secondary problem that would appear when the leaf expression
refers to the unnested column, like in this case:
```
CREATE TABLE list_struct_table
AS VALUES
([struct(1, 'a'), struct(2, 'b')]),
([struct(3, 'c')]);
SELECT get_field(unnest(column1), 'c0'), get_field(unnest(column1), 'c1')
FROM list_struct_table;
```
The problem is in the logic that `try_push_into_inputs` uses to verify that
an expression can be pushed through a node. It needs to make sure that the
expressions to push through a node are in the input of that node to allow it to
be pushed through. The problem is that for Unnest, the columns being unnest are
going to have the same name as the input columns. Because the check compares
just the names, it assumes this means that the columns referred in the
expression are in the `Unnest` input and the check passes. But these are not
the columns that the expression refers to. The expression is referring to the
unnested columns, not to the input list columns, and therefore we can't push
the expressions.
### To Reproduce
First case:
```
CREATE TABLE struct_and_list_table
AS VALUES
(struct(1, 2), [10, 20, 30]),
(struct(3, 4), [40, 50]);
SELECT sum(get_field(s, 'c0'))
FROM (SELECT s, unnest(arr)
FROM (SELECT column1 AS s, column2 AS arr
FROM struct_and_list_table));
```
Second case:
```
CREATE TABLE list_struct_table
AS VALUES
([struct(1, 'a'), struct(2, 'b')]),
([struct(3, 'c')]);
query IT
SELECT get_field(unnest(column1), 'c0'), get_field(unnest(column1), 'c1')
FROM list_struct_table;
```
Both queries fail with:
```
Internal error: Assertion failed: expr.is_empty():
```
### Expected behavior
Both queries work.
First query result:
```
9
```
Second query result:
```
1 a
2 b
3 c
```
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]