Omega359 commented on code in PR #14684:
URL: https://github.com/apache/datafusion/pull/14684#discussion_r2282438652
##########
datafusion/core/src/dataframe/mod.rs:
##########
@@ -1972,41 +1972,85 @@ impl DataFrame {
.config_options()
.sql_parser
.enable_ident_normalization;
+
let old_column: Column = if ident_opts {
Column::from_qualified_name(old_name)
} else {
Column::from_qualified_name_ignore_case(old_name)
};
- let (qualifier_rename, field_rename) =
- match self.plan.schema().qualified_field_from_column(&old_column) {
- Ok(qualifier_and_field) => qualifier_and_field,
- // no-op if field not found
- Err(DataFusionError::SchemaError(
- SchemaError::FieldNotFound { .. },
- _,
- )) => return Ok(self),
- Err(err) => return Err(err),
- };
- let projection = self
- .plan
- .schema()
- .iter()
- .map(|(qualifier, field)| {
- if qualifier.eq(&qualifier_rename) && field.as_ref() ==
field_rename {
- (
- col(Column::from((qualifier, field)))
- .alias_qualified(qualifier.cloned(), new_name),
- false,
- )
- } else {
- (col(Column::from((qualifier, field))), false)
- }
- })
- .collect::<Vec<_>>();
- let project_plan = LogicalPlanBuilder::from(self.plan)
- .project_with_validation(projection)?
- .build()?;
+ let project_plan = if let LogicalPlan::Projection(Projection {
Review Comment:
While it may not be a problem exactly I do think it may be worth it for
cases where a large number of with_column (thus projections) are happening.
You can see from a [sample
instrumentation](https://gist.github.com/Omega359/bf96eff97e6dc784e0cfd6a81bfa7b67)
for a dataframe operation I do that handling the projections is take a long
time:
`Optimization for rule optimize_projections took > 50ms: 22821 ms`
The logical plan is 31MB in text (`.display_indent()`) and you can see just
how nested things can get just from the document map in Notepad++
<img width="1911" height="1129" alt="image"
src="https://github.com/user-attachments/assets/c41fda8a-d4af-40b5-b76a-f3eabf0785f4"
/>
I posted about improving the planning performance via multithreading in the
[discord
channel](https://discord.com/channels/885562378132000778/1166447479609376850/1406384187228819680)
but from a quick look I'm not sure that approach will necessarily be a simple
change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]