berkaysynnada commented on code in PR #15779: URL: https://github.com/apache/datafusion/pull/15779#discussion_r2052131746
########## datafusion/physical-expr-common/src/physical_expr.rs: ########## @@ -333,6 +333,15 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash { // This is a safe default behavior. Ok(None) } + + /// Adapt this [`PhysicalExpr`] to a new schema. + /// For example, `Column("b", 1)` can be adapted to `Column("b", 0)` + /// given the schema `Schema::new(vec![("b", DataType::Int32)])`. + fn with_schema(&self, _schema: &Schema) -> Result<Option<Arc<dyn PhysicalExpr>>> { Review Comment: I think introducing such an API is dangerous. Since during the physical planning duplicate names are possible, a column shouldn't be used in a place apart from its reference schema. I guess this need occurs because expressions need to be updated while going up and down across the plan tree. I've actually thought on that, and have an idea: There should be an API at the ExecutionPlan level, like update_expression(&self, expr: Arc<dyn PhysicalExpr>, down: bool) -> Option<Arc<dyn PhysicalExpr>> if `down` is true, it means that this expression should be valid at the **output** of this operator, and we need that expression in a way that how it's represented at the input schema of this operator. If the input schema doesn't have any part of this expression, or the expression cannot be built on the output schema columns of this operator, then the function return None. Likewise, if the down flag is false, this means that this expression references to the input schema of this operator, and we request it corresponding to the output schema of this operator. Similar failure cases returns None here as well. ########## datafusion/physical-expr-common/src/physical_expr.rs: ########## @@ -333,6 +333,15 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash { // This is a safe default behavior. Ok(None) } + + /// Adapt this [`PhysicalExpr`] to a new schema. + /// For example, `Column("b", 1)` can be adapted to `Column("b", 0)` + /// given the schema `Schema::new(vec![("b", DataType::Int32)])`. + fn with_schema(&self, _schema: &Schema) -> Result<Option<Arc<dyn PhysicalExpr>>> { Review Comment: BTW, a similar approach is done in projection_pushdown rule, but this design is better IMO ########## datafusion/physical-expr/src/expressions/column.rs: ########## @@ -141,6 +141,18 @@ impl PhysicalExpr for Column { fn fmt_sql(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { write!(f, "{}", self.name) } + + fn with_schema(&self, schema: &Schema) -> Result<Option<Arc<dyn PhysicalExpr>>> { + // Find our index in the new schema + let new_index = schema.index_of(&self.name)?; Review Comment: This `index_of` works with `find()`, which have caused many bugs on tables having same named columns -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org