gruuya commented on issue #8008: URL: https://github.com/apache/arrow-datafusion/issues/8008#issuecomment-1793470221
> Somehow, [Projection](https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Projection.html) doesn't seem to have this problem. Yeah I think this problem only surfaces when an initial plan gets transformed into a combination of other plans (one of which is an aggregation) during optimizations, such as in the case of `DISTINCT ON` (though not bare `DISTINCT`, since then the grouping fields have the same relation/name as the initial selection list), but we still want to abide by the original plan schema (as is the case in all logical optimizations). > Shouldn't it be something more like this (I can't remember if you have to explicitly build qualified identifiers specially): Not really, because the name of the alias [always gets crammed](https://github.com/apache/arrow-datafusion/blob/656c6a93fadcec7bc43a8a881dfaf55388b0b5c6/datafusion/expr/src/expr_schema.rs#L285-L305) into the `DFField::name`: ```text left: DFSchema { fields: [DFField { qualifier: Some(Bare { table: "test" }), field: Field { name: "a", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, DFField { qualifier: Some(Bare { table: "test" }), field: Field { name: "b", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, DFField { qualifier: Some(Bare { table: "test" }), field: Field { name: "c", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {}, functional_dependencies: FunctionalDependencies { deps: [] } } right: DFSchema { fields: [DFField { qualifier: Some(Bare { table: "test" }), field: Field { name: "a", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, DFField { qualifier: None, field: Field { name: "test.b", data_type: UInt32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }, DFField { qualifier: None, field: Field { name: "test.c", data_type: UInt32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {}, functional_dependencies: FunctionalDependencies { deps: [FunctionalDependence { source_indices: [0], target_indices: [0, 1, 2], nullable: false, mode: Single }] } } ``` In fact that's why I opened this issue, because I see no way of building qualified fields via expressions currently. `LogicalPlan::SubqueryAlias` does address the problem partially, but is not a general solution. In other words the base plan could involve joins or derived fields, so the output schema may include fields with no qualifiers as well as fields with different qualifiers and there's no adequate way to match that schema. Whereas if there was a solution like above `ExprSchemable::to_fields` could be extended so that for each individual field besides the name a relation could be passed on, e.g. something like: ```diff diff --git a/datafusion/expr/src/expr_schema.rs b/datafusion/expr/src/expr_schema.rs index 025b74eb5..61d1ffe53 100644 --- a/datafusion/expr/src/expr_schema.rs +++ b/datafusion/expr/src/expr_schema.rs @@ -295,6 +295,13 @@ impl ExprSchemable for Expr { self.nullable(input_schema)?, ) .with_metadata(self.metadata(input_schema)?)), + Expr::QualifiedAlias(qa) => Ok(DFField::new( + qa.relation.clone(), + &qa.name, + self.get_type(input_schema)?, + self.nullable(input_schema)?, + ) + .with_metadata(self.metadata(input_schema)?)), _ => Ok(DFField::new_unqualified( &self.display_name()?, self.get_type(input_schema)?, ``` and so the correct projection would look something like: ```rust let projection_plan = LogicalPlanBuilder::from(new_plan) .project( new_plan .schema() .fields() .iter() .zip(original_plan.schema().fields().iter()) .map(|(new_field, old_field)| { col(new_field.name()) .alias_qualified(old_field.qualifier(), old_field.name()) }), )? .build()?; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
