alamb commented on code in PR #20117:
URL: https://github.com/apache/datafusion/pull/20117#discussion_r2800896258
##########
datafusion/sqllogictest/test_files/projection_pushdown.slt:
##########
@@ -954,27 +965,29 @@ EXPLAIN SELECT (id + s['value']) * (id + s['value']) as
id_and_value FROM simple
----
logical_plan
01)Projection: __common_expr_1 * __common_expr_1 AS id_and_value
-02)--Projection: simple_struct.id + get_field(simple_struct.s, Utf8("value"))
AS __common_expr_1
+02)--Projection: simple_struct.id + __datafusion_extracted_2 AS __common_expr_1
03)----Filter: simple_struct.id > Int64(2)
-04)------TableScan: simple_struct projection=[id, s],
partial_filters=[simple_struct.id > Int64(2)]
+04)------Projection: get_field(simple_struct.s, Utf8("value")) AS
__datafusion_extracted_2, simple_struct.id
+05)--------TableScan: simple_struct projection=[id, s],
partial_filters=[simple_struct.id > Int64(2)]
physical_plan
01)ProjectionExec: expr=[__common_expr_1@0 * __common_expr_1@0 as id_and_value]
-02)--ProjectionExec: expr=[id@0 + get_field(s@1, value) as __common_expr_1]
-03)----FilterExec: id@0 > 2
-04)------DataSourceExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/projection_pushdown/simple.parquet]]},
projection=[id, s], file_type=parquet, predicate=id@0 > 2,
pruning_predicate=id_null_count@1 != row_count@2 AND id_max@0 > 2,
required_guarantees=[]
+02)--ProjectionExec: expr=[id@1 + __datafusion_extracted_2@0 as
__common_expr_1]
+03)----FilterExec: id@1 > 2
+04)------DataSourceExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/projection_pushdown/simple.parquet]]},
projection=[get_field(s@1, value) as __datafusion_extracted_2, id],
file_type=parquet, predicate=id@0 > 2, pruning_predicate=id_null_count@1 !=
row_count@2 AND id_max@0 > 2, required_guarantees=[]
query TT
EXPLAIN SELECT s['value'] + s['value'] as doubled FROM simple_struct WHERE id
> 2;
----
logical_plan
-01)Projection: get_field(simple_struct.s, Utf8("value")) +
get_field(simple_struct.s, Utf8("value")) AS doubled
+01)Projection: __datafusion_extracted_1 + __datafusion_extracted_1 AS doubled
02)--Filter: simple_struct.id > Int64(2)
-03)----TableScan: simple_struct projection=[id, s],
partial_filters=[simple_struct.id > Int64(2)]
+03)----Projection: get_field(simple_struct.s, Utf8("value")) AS
__datafusion_extracted_1, simple_struct.id
+04)------TableScan: simple_struct projection=[id, s],
partial_filters=[simple_struct.id > Int64(2)]
physical_plan
-01)ProjectionExec: expr=[get_field(s@0, value) + get_field(s@0, value) as
doubled]
-02)--FilterExec: id@0 > 2, projection=[s@1]
-03)----DataSourceExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/projection_pushdown/simple.parquet]]},
projection=[id, s], file_type=parquet, predicate=id@0 > 2,
pruning_predicate=id_null_count@1 != row_count@2 AND id_max@0 > 2,
required_guarantees=[]
+01)ProjectionExec: expr=[__datafusion_extracted_1@0 +
__datafusion_extracted_1@0 as doubled]
Review Comment:
this is a nice plan -- it shows that the field is extracted once and then
the expression is computed after the filter 👍
##########
datafusion/optimizer/src/extract_leaf_expressions.rs:
##########
@@ -283,13 +1295,20 @@ mod tests {
TableScan: test projection=[id, user]
## After Extraction
- (same as original)
+ Projection: test.id
+ Projection: test.id, test.user
+ Filter: __datafusion_extracted_1 = Utf8("active")
+ Projection: leaf_udf(test.user, Utf8("status")) AS
__datafusion_extracted_1, test.id, test.user
+ TableScan: test projection=[id, user]
## After Pushdown
(same as after extraction)
## Optimized
- (same as after pushdown)
+ Projection: test.id
+ Filter: __datafusion_extracted_1 = Utf8("active")
+ Projection: leaf_udf(test.user, Utf8("status")) AS
__datafusion_extracted_1, test.id
Review Comment:
this shows the result of this rewrite pretty nicely -- the extraction was
pushed below the filter (though in this case it probably doesn't make any
difference to the performance)
##########
datafusion/sqllogictest/test_files/insert.slt:
##########
@@ -165,7 +165,7 @@ ORDER BY c1
----
logical_plan
01)Dml: op=[Insert Into] table=[table_without_values]
-02)--Projection: a1 AS a1, a2 AS a2
+02)--Projection: a1, a2
Review Comment:
that is certainly nicer
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]