Re: [PR] Reuse last projection layer when renaming columns [datafusion]

via GitHub Mon, 18 Aug 2025 06:43:05 -0700


Omega359 commented on code in PR #14684:
URL: https://github.com/apache/datafusion/pull/14684#discussion_r2282438652



##########
datafusion/core/src/dataframe/mod.rs:
##########
@@ -1972,41 +1972,85 @@ impl DataFrame {
             .config_options()
             .sql_parser
             .enable_ident_normalization;
+
         let old_column: Column = if ident_opts {
             Column::from_qualified_name(old_name)
         } else {
             Column::from_qualified_name_ignore_case(old_name)
         };
 
-        let (qualifier_rename, field_rename) =
-            match self.plan.schema().qualified_field_from_column(&old_column) {
-                Ok(qualifier_and_field) => qualifier_and_field,
-                // no-op if field not found
-                Err(DataFusionError::SchemaError(
-                    SchemaError::FieldNotFound { .. },
-                    _,
-                )) => return Ok(self),
-                Err(err) => return Err(err),
-            };
-        let projection = self
-            .plan
-            .schema()
-            .iter()
-            .map(|(qualifier, field)| {
-                if qualifier.eq(&qualifier_rename) && field.as_ref() == 
field_rename {
-                    (
-                        col(Column::from((qualifier, field)))
-                            .alias_qualified(qualifier.cloned(), new_name),
-                        false,
-                    )
-                } else {
-                    (col(Column::from((qualifier, field))), false)
-                }
-            })
-            .collect::<Vec<_>>();
-        let project_plan = LogicalPlanBuilder::from(self.plan)
-            .project_with_validation(projection)?
-            .build()?;
+        let project_plan = if let LogicalPlan::Projection(Projection {

Review Comment:
   While it may not be a problem exactly I do think it may be worth it for 
cases where a large number of with_column (thus projections) are happening.
   
   You can see from a [sample 
instrumentation](https://gist.github.com/Omega359/bf96eff97e6dc784e0cfd6a81bfa7b67)
 for a dataframe operation I do that handling the projections is take a long 
time:
   
   `Optimization for rule optimize_projections took > 50ms: 22821 ms`
   
   The logical plan is 31MB in text (`.display_indent()`) and you can see just 
how nested things can get just from the document map in Notepad++
   <img width="1911" height="1129" alt="image" 
src="https://github.com/user-attachments/assets/c41fda8a-d4af-40b5-b76a-f3eabf0785f4";
 />
   
   I posted about improving the planning performance via multithreading in the 
[discord 
channel](https://discord.com/channels/885562378132000778/1166447479609376850/1406384187228819680)
 but from a quick look I'm not sure that approach will necessarily be a simple 
change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

Reply via email to