berkaysynnada commented on code in PR #15779:
URL: https://github.com/apache/datafusion/pull/15779#discussion_r2052131746


##########
datafusion/physical-expr-common/src/physical_expr.rs:
##########
@@ -333,6 +333,15 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + 
DynEq + DynHash {
         // This is a safe default behavior.
         Ok(None)
     }
+
+    /// Adapt this [`PhysicalExpr`] to a new schema.
+    /// For example, `Column("b", 1)` can be adapted to `Column("b", 0)`
+    /// given the schema `Schema::new(vec![("b", DataType::Int32)])`.
+    fn with_schema(&self, _schema: &Schema) -> Result<Option<Arc<dyn 
PhysicalExpr>>> {

Review Comment:
   I think introducing such an API is dangerous. Since during the physical 
planning duplicate names are possible, a column shouldn't be used in a place 
apart from its reference schema. 
   
   I guess this need occurs because expressions need to be updated while going 
up and down across the plan tree.  I've actually thought on that, and have an 
idea:
   
   There should be an API at the ExecutionPlan level, like 
update_expression(&self, expr: Arc<dyn PhysicalExpr>, down: bool) -> 
Option<Arc<dyn PhysicalExpr>>
   
   if `down` is true, it means that this expression should be valid at the 
**output** of this operator, and we need that expression in a way that how it's 
represented at the input schema of this operator. If the input schema doesn't 
have any part of this expression, or the expression cannot be built on the 
output schema columns of this operator, then the function return None.
   
   Likewise, if the down flag is false, this means that this expression 
references to the input schema of this operator, and we request it 
corresponding to the output schema of this operator. Similar failure cases 
returns None here as well.
   



##########
datafusion/physical-expr-common/src/physical_expr.rs:
##########
@@ -333,6 +333,15 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + 
DynEq + DynHash {
         // This is a safe default behavior.
         Ok(None)
     }
+
+    /// Adapt this [`PhysicalExpr`] to a new schema.
+    /// For example, `Column("b", 1)` can be adapted to `Column("b", 0)`
+    /// given the schema `Schema::new(vec![("b", DataType::Int32)])`.
+    fn with_schema(&self, _schema: &Schema) -> Result<Option<Arc<dyn 
PhysicalExpr>>> {

Review Comment:
   BTW, a similar approach is done in projection_pushdown rule, but this design 
is better IMO



##########
datafusion/physical-expr/src/expressions/column.rs:
##########
@@ -141,6 +141,18 @@ impl PhysicalExpr for Column {
     fn fmt_sql(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
         write!(f, "{}", self.name)
     }
+
+    fn with_schema(&self, schema: &Schema) -> Result<Option<Arc<dyn 
PhysicalExpr>>> {
+        // Find our index in the new schema
+        let new_index = schema.index_of(&self.name)?;

Review Comment:
   This `index_of` works with `find()`, which have caused many bugs on tables 
having same named columns



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to