berkaysynnada commented on code in PR #6566:
URL: https://github.com/apache/arrow-datafusion/pull/6566#discussion_r1236971325
##########
datafusion/sql/src/select.rs:
##########
@@ -194,7 +198,13 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
.iter()
.map(|expr| rebase_expr(expr, &window_func_exprs, &plan))
.collect::<Result<Vec<Expr>>>()?;
-
+ if select.into.is_some() {
+ for expr in select_exprs_post_aggr.iter_mut() {
+ if let Expr::Column(_) = expr.clone() {
+ *expr = expr.clone().alias(physical_name(expr)?);
+ }
+ }
+ }
Review Comment:
Sorry for the late reply. I have tried both suggestions @alamb:
1) We need to modify 3 parts of the code:
a- In the `project()` function in select.rs, the final schema will be
constructed with a shortened form of the window function.
```
fn to_field(&self, input_schema: &DFSchema) -> Result<DFField> {
match self {
Expr::Column(c) => Ok(DFField::new(
c.relation.clone(),
&c.name,
self.get_type(input_schema)?,
self.nullable(input_schema)?,
)),
_ => {
Ok(DFField::new_unqualified(
&self.display_name()?,
self.get_type(input_schema)?,
self.nullable(input_schema)?,
))
}
}
}
```
is expanded with that arm:
```
Expr::WindowFunction(WindowFunction { fun, args, .. }) => {
Ok(DFField::new_unqualified(
&vec![create_function_name(&fun.to_string(), false,
args)?].join(" "),
self.get_type(input_schema)?,
self.nullable(input_schema)?,
))
}
```
b- In the `project()` function again, qualified wildcard columns are
normalized. However, the column name is in the longer form, and the schema of
the plan is in the shorter form. Therefore, we also change the
`expr_as_column_expr()` function so that window function expressions are
converted to column expressions with the shortened column name format, which
can be copied from the schema.
c- `PushDownProjection` rule again creates new column expressions with
`display_name() `function (which returns the long format) in the window
handling arm. These column names also need to be shortened to satisfy
subsequent control.
**My opinion:** Can we directly change the `display_name()` function for
window functions such that only function name and arguments are returned? Thus
we don't need to change any of what I mentioned above.
```
Expr::WindowFunction(WindowFunction {
fun, args, window_frame, partition_by, order_by,
}) => {
let mut parts: Vec<String> =
vec![create_function_name(&fun.to_string(), false, args)?];
if !partition_by.is_empty() {
parts.push(format!("PARTITION BY {partition_by:?}"));
}
if !order_by.is_empty() {
parts.push(format!("ORDER BY {order_by:?}"));
}
parts.push(format!("{window_frame}"));
Ok(parts.join(" "))
}
```
new version:
```
Expr::WindowFunction(WindowFunction {
fun, args, ..
}) => {
let mut parts: Vec<String> =
vec![create_function_name(&fun.to_string(), false, args)?];
Ok(parts.join(" "))
}
```
2) We can only change `create_window_expr()` such that it creates the window
name with `display_name()` rather than `physical_name()`, but it makes the
plans longer, also lots of test change burden.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]