gruuya commented on issue #8008:
URL: 
https://github.com/apache/arrow-datafusion/issues/8008#issuecomment-1793470221

   > Somehow, 
[Projection](https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Projection.html)
 doesn't seem to have this problem.
   
   Yeah I think this problem only surfaces when an initial plan gets 
transformed into a combination of other plans (one of which is an aggregation) 
during optimizations, such as in the case of `DISTINCT ON` (though not bare 
`DISTINCT`, since then the grouping fields have the same relation/name as the 
initial selection list), but we still want to abide by the original plan schema 
(as is the case in all logical optimizations).
   
   > Shouldn't it be something more like this (I can't remember if you have to 
explicitly build qualified identifiers specially):
   
   Not really, because the name of the alias [always gets 
crammed](https://github.com/apache/arrow-datafusion/blob/656c6a93fadcec7bc43a8a881dfaf55388b0b5c6/datafusion/expr/src/expr_schema.rs#L285-L305)
 into the `DFField::name`:
   ```text
     left: DFSchema { fields: [DFField { qualifier: Some(Bare { table: "test" 
}), field: Field { name: "a", data_type: UInt32, nullable: false, dict_id: 0, 
dict_is_ordered: false, metadata: {} } }, DFField { qualifier: Some(Bare { 
table: "test" }), field: Field { name: "b", data_type: UInt32, nullable: false, 
dict_id: 0, dict_is_ordered: false, metadata: {} } }, DFField { qualifier: 
Some(Bare { table: "test" }), field: Field { name: "c", data_type: UInt32, 
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }], 
metadata: {}, functional_dependencies: FunctionalDependencies { deps: [] } }
    right: DFSchema { fields: [DFField { qualifier: Some(Bare { table: "test" 
}), field: Field { name: "a", data_type: UInt32, nullable: false, dict_id: 0, 
dict_is_ordered: false, metadata: {} } }, DFField { qualifier: None, field: 
Field { name: "test.b", data_type: UInt32, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} } }, DFField { qualifier: None, field: 
Field { name: "test.c", data_type: UInt32, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} } }], metadata: {}, 
functional_dependencies: FunctionalDependencies { deps: [FunctionalDependence { 
source_indices: [0], target_indices: [0, 1, 2], nullable: false, mode: Single 
}] } }
   ``` 
   In fact that's why I opened this issue, because I see no way of building 
qualified fields via expressions currently. 
   
   `LogicalPlan::SubqueryAlias` does address the problem partially, but is not 
a general solution. In other words the base plan could involve joins or derived 
fields, so the output schema may include fields with no qualifiers as well as 
fields with different qualifiers and there's no adequate way to match that 
schema. Whereas if there was a solution like above  `ExprSchemable::to_fields` 
could be extended so that for each individual field besides the name a relation 
could be passed on, e.g. something like:
   ```diff
   diff --git a/datafusion/expr/src/expr_schema.rs 
b/datafusion/expr/src/expr_schema.rs
   index 025b74eb5..61d1ffe53 100644
   --- a/datafusion/expr/src/expr_schema.rs
   +++ b/datafusion/expr/src/expr_schema.rs
   @@ -295,6 +295,13 @@ impl ExprSchemable for Expr {
                    self.nullable(input_schema)?,
                )
                .with_metadata(self.metadata(input_schema)?)),
   +            Expr::QualifiedAlias(qa) => Ok(DFField::new(
   +                qa.relation.clone(),
   +                &qa.name,
   +                self.get_type(input_schema)?,
   +                self.nullable(input_schema)?,
   +            )
   +                .with_metadata(self.metadata(input_schema)?)),
                _ => Ok(DFField::new_unqualified(
                    &self.display_name()?,
                    self.get_type(input_schema)?,
   ```
   
   and so the correct projection would look something like:
   ```rust
           let projection_plan = LogicalPlanBuilder::from(new_plan)
               .project(
                   new_plan
                       .schema()
                       .fields()
                       .iter()
                       .zip(original_plan.schema().fields().iter())
                       .map(|(new_field, old_field)| {
                           col(new_field.name())
                               .alias_qualified(old_field.qualifier(), 
old_field.name())
                       }),
               )?
               .build()?;
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to