alamb commented on code in PR #15253:
URL: https://github.com/apache/datafusion/pull/15253#discussion_r1999392980
##########
datafusion/expr/src/expr.rs:
##########
@@ -2607,11 +2793,23 @@ pub(crate) fn
schema_name_from_exprs_comma_separated_without_space(
schema_name_from_exprs_inner(exprs, ",")
}
+/// Get `sql_name` for Vector of expressions.
Review Comment:
Instead of adding so many new so specific functions, what do you think about
adding something like the following. I think the API would be easier to
understand and it would also let us save a `String` copy when formatting as sql
```rust
/// Formats a list of `&Expr` with a custom separator
struct ExprListDisplay<'a> {
exprs: &'a [Expr],
sep: &'a str,
}
...
impl <'a> Display for ExprListDisplay<'a> {
...
}
```
##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -801,6 +803,16 @@ impl DisplayAs for AggregateExec {
}
}
DisplayFormatType::TreeRender => {
+ let format_expr_with_alias =
+ |(e, alias): &(Arc<dyn PhysicalExpr>, String)| -> String {
+ let expr_sql = fmt_sql(e.as_ref()).to_string();
Review Comment:
I think you can avoid a string copy here like:
```suggestion
let expr_sql = fmt_sql(e.as_ref());
```
And then directly wrote them to `f` rather than creating strings
However I don't think this is performance critical code so it is also ok
this way too (and it follows the previous pattern)
Maybe as a follow on PR we can avoid as many copies in this code
##########
datafusion/expr/src/expr.rs:
##########
@@ -2596,6 +2612,176 @@ impl Display for SchemaDisplay<'_> {
}
}
+struct SqlDisplay<'a>(&'a Expr);
+impl Display for SqlDisplay<'_> {
Review Comment:
We already have code to convert from `Expr` to SQL, in
https://docs.rs/datafusion/latest/datafusion/sql/unparser/struct.Unparser.html#method.expr_to_sql
However, datafusion-expr doesn't currently depend on the datafusion-sql
crate which is probably a good thing
So I think we should add a note in `Expr::sql_name` that if the user's want
syntactically correct SQL to feed to some other system, they should use the
`Unparser` and leave a link to
https://docs.rs/datafusion/latest/datafusion/sql/unparser/struct.Unparser.html
##########
datafusion/physical-expr/src/aggregate.rs:
##########
@@ -215,6 +224,7 @@ pub struct AggregateFunctionExpr {
/// Output / return type of this aggregate
data_type: DataType,
name: String,
Review Comment:
```suggestion
/// Output column name that this expression creates
name: String,
```
##########
datafusion/physical-expr/src/aggregate.rs:
##########
@@ -215,6 +224,7 @@ pub struct AggregateFunctionExpr {
/// Output / return type of this aggregate
data_type: DataType,
name: String,
+ sql_name: String,
Review Comment:
I am worried about adding this new String for every single
AggregateFunctionExpr even when we are not printing an explain plan -- we are
trying try to keep DataFusion's planning time from getting out of hand so if we
can avoid forcing a new String that would be better
##########
datafusion/sqllogictest/test_files/explain_tree.slt:
##########
@@ -204,53 +204,50 @@ physical_plan
04)│ aggr: │
05)│ sum(table1.bigint_col) │
06)│ │
-07)│ group_by: │
-08)│ string_col@0 as string_col│
-09)│ │
-10)│ mode: │
-11)│ FinalPartitioned │
-12)└─────────────┬─────────────┘
-13)┌─────────────┴─────────────┐
-14)│ CoalesceBatchesExec │
-15)│ -------------------- │
-16)│ target_batch_size: │
-17)│ 8192 │
-18)└─────────────┬─────────────┘
-19)┌─────────────┴─────────────┐
-20)│ RepartitionExec │
-21)│ -------------------- │
-22)│ output_partition_count: │
-23)│ 4 │
-24)│ │
-25)│ partitioning_scheme: │
-26)│ Hash([string_col@0], 4) │
-27)└─────────────┬─────────────┘
-28)┌─────────────┴─────────────┐
-29)│ AggregateExec │
-30)│ -------------------- │
-31)│ aggr: │
-32)│ sum(table1.bigint_col) │
-33)│ │
-34)│ group_by: │
-35)│ string_col@0 as string_col│
-36)│ │
-37)│ mode: Partial │
-38)└─────────────┬─────────────┘
-39)┌─────────────┴─────────────┐
-40)│ RepartitionExec │
-41)│ -------------------- │
-42)│ output_partition_count: │
-43)│ 1 │
-44)│ │
-45)│ partitioning_scheme: │
-46)│ RoundRobinBatch(4) │
-47)└─────────────┬─────────────┘
-48)┌─────────────┴─────────────┐
-49)│ DataSourceExec │
-50)│ -------------------- │
-51)│ files: 1 │
-52)│ format: csv │
-53)└───────────────────────────┘
+07)│ group_by: string_col │
Review Comment:
💯
##########
datafusion/physical-expr/src/aggregate.rs:
##########
@@ -215,6 +224,7 @@ pub struct AggregateFunctionExpr {
/// Output / return type of this aggregate
data_type: DataType,
name: String,
+ sql_name: String,
Review Comment:
I played around with it and I couldn't find any way to avoid this.
```suggestion
/// Simplified name for `tree` explain.
sql_name: String,
```
##########
datafusion/sqllogictest/test_files/explain_tree.slt:
##########
@@ -1477,69 +1474,60 @@ physical_plan
07)┌─────────────┴─────────────┐
08)│ AggregateExec │
09)│ -------------------- │
-10)│ aggr: count(Int64(1)) │
-11)│ │
-12)│ group_by: │
-13)│ name@0 as name │
-14)│ │
-15)│ mode: │
-16)│ SinglePartitioned │
-17)└─────────────┬─────────────┘
-18)┌─────────────┴─────────────┐
-19)│ InterleaveExec ├──────────────┐
-20)└─────────────┬─────────────┘ │
-21)┌─────────────┴─────────────┐┌─────────────┴─────────────┐
-22)│ AggregateExec ││ AggregateExec │
-23)│ -------------------- ││ -------------------- │
-24)│ aggr ││ aggr │
-25)│ ││ │
-26)│ group_by: ││ group_by: │
-27)│ name@0 as name ││ name@0 as name │
-28)│ ││ │
-29)│ mode: ││ mode: │
-30)│ FinalPartitioned ││ FinalPartitioned │
-31)└─────────────┬─────────────┘└─────────────┬─────────────┘
-32)┌─────────────┴─────────────┐┌─────────────┴─────────────┐
-33)│ CoalesceBatchesExec ││ CoalesceBatchesExec │
-34)│ -------------------- ││ -------------------- │
-35)│ target_batch_size: ││ target_batch_size: │
-36)│ 8192 ││ 8192 │
-37)└─────────────┬─────────────┘└─────────────┬─────────────┘
-38)┌─────────────┴─────────────┐┌─────────────┴─────────────┐
-39)│ RepartitionExec ││ RepartitionExec │
-40)│ -------------------- ││ -------------------- │
-41)│ output_partition_count: ││ output_partition_count: │
-42)│ 4 ││ 4 │
-43)│ ││ │
-44)│ partitioning_scheme: ││ partitioning_scheme: │
-45)│ Hash([name@0], 4) ││ Hash([name@0], 4) │
-46)└─────────────┬─────────────┘└─────────────┬─────────────┘
-47)┌─────────────┴─────────────┐┌─────────────┴─────────────┐
-48)│ RepartitionExec ││ RepartitionExec │
-49)│ -------------------- ││ -------------------- │
-50)│ output_partition_count: ││ output_partition_count: │
-51)│ 1 ││ 1 │
-52)│ ││ │
-53)│ partitioning_scheme: ││ partitioning_scheme: │
-54)│ RoundRobinBatch(4) ││ RoundRobinBatch(4) │
-55)└─────────────┬─────────────┘└─────────────┬─────────────┘
-56)┌─────────────┴─────────────┐┌─────────────┴─────────────┐
-57)│ AggregateExec ││ AggregateExec │
-58)│ -------------------- ││ -------------------- │
-59)│ aggr ││ aggr │
-60)│ ││ │
-61)│ group_by: ││ group_by: │
-62)│ name@0 as name ││ name@0 as name │
-63)│ ││ │
-64)│ mode: Partial ││ mode: Partial │
-65)└─────────────┬─────────────┘└─────────────┬─────────────┘
-66)┌─────────────┴─────────────┐┌─────────────┴─────────────┐
-67)│ DataSourceExec ││ DataSourceExec │
-68)│ -------------------- ││ -------------------- │
-69)│ bytes: 1320 ││ bytes: 1312 │
-70)│ format: memory ││ format: memory │
-71)│ rows: 1 ││ rows: 1 │
-72)└───────────────────────────┘└───────────────────────────┘
+10)│ aggr: count(1) │
Review Comment:
this is wonderful
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]