Omega359 commented on code in PR #14684: URL: https://github.com/apache/datafusion/pull/14684#discussion_r2282438652
########## datafusion/core/src/dataframe/mod.rs: ########## @@ -1972,41 +1972,85 @@ impl DataFrame { .config_options() .sql_parser .enable_ident_normalization; + let old_column: Column = if ident_opts { Column::from_qualified_name(old_name) } else { Column::from_qualified_name_ignore_case(old_name) }; - let (qualifier_rename, field_rename) = - match self.plan.schema().qualified_field_from_column(&old_column) { - Ok(qualifier_and_field) => qualifier_and_field, - // no-op if field not found - Err(DataFusionError::SchemaError( - SchemaError::FieldNotFound { .. }, - _, - )) => return Ok(self), - Err(err) => return Err(err), - }; - let projection = self - .plan - .schema() - .iter() - .map(|(qualifier, field)| { - if qualifier.eq(&qualifier_rename) && field.as_ref() == field_rename { - ( - col(Column::from((qualifier, field))) - .alias_qualified(qualifier.cloned(), new_name), - false, - ) - } else { - (col(Column::from((qualifier, field))), false) - } - }) - .collect::<Vec<_>>(); - let project_plan = LogicalPlanBuilder::from(self.plan) - .project_with_validation(projection)? - .build()?; + let project_plan = if let LogicalPlan::Projection(Projection { Review Comment: While it may not be a problem exactly I do think it may be worth it for cases where a large number of with_column (thus projections) are happening. You can see from a [sample instrumentation](https://gist.github.com/Omega359/bf96eff97e6dc784e0cfd6a81bfa7b67) for a dataframe operation I do that handling the projections is take a long time: `Optimization for rule optimize_projections took > 50ms: 22821 ms` The logical plan is 31MB in text (`.display_indent()`) and you can see just how nested things can get just from the document map in Notepad++ <img width="1911" height="1129" alt="image" src="https://github.com/user-attachments/assets/c41fda8a-d4af-40b5-b76a-f3eabf0785f4" /> I posted about improving the planning performance via multithreading in the [discord channel](https://discord.com/channels/885562378132000778/1166447479609376850/1406384187228819680) but from a quick look I'm not sure that approach will necessarily be a simple change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org