alamb commented on code in PR #11797:
URL: https://github.com/apache/datafusion/pull/11797#discussion_r1706039120
##########
datafusion/expr/src/udf.rs:
##########
@@ -345,12 +351,22 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
fn name(&self) -> &str;
/// Returns the user-defined display name of the UDF given the arguments
- ///
fn display_name(&self, args: &[Expr]) -> Result<String> {
- let names: Vec<String> =
args.iter().map(create_name).collect::<Result<_>>()?;
+ let names: Vec<String> =
args.iter().map(ToString::to_string).collect();
+ // TODO: join with ", " to standardize the formatting of Vec<Expr>,
<https://github.com/apache/datafusion/issues/10364>
Ok(format!("{}({})", self.name(), names.join(",")))
}
+ /// Returns the user-defined schema name of the UDF given the arguments
Review Comment:
```suggestion
/// Returns the name of the column this expression would create
///
/// See [`Expr::schema_name`] for details
```
Maybe it is also worth doing the `impl Display` thing here too which would
allow avoiding these string allocations
##########
datafusion/expr/src/expr.rs:
##########
@@ -983,10 +983,290 @@ impl PartialOrd for Expr {
}
impl Expr {
- /// Returns the name of this expression as it should appear in a schema.
This name
- /// will not include any CAST expressions.
- pub fn display_name(&self) -> Result<String> {
- create_name(self)
+ /// Returns the name for schema / field that is different from Display
+ /// Most of the expressions are the same as Display
+ /// Those are the main difference
+ /// 1. Alias, where excludes expression
+ /// 2. Cast / TryCast, where takes expression only
Review Comment:
Here is a suggesiton for how to improve this documentation:
```suggestion
/// The name of the column (field) that this `Expr` will produce.
///
/// For example, for a projection (e.g. `SELECT <expr>`) the resulting
arrow
/// [`Schema`] will have a field with this name.
///
/// Note that the resulting string is subtlety different than the
`Display`
/// representation for certain `Expr`. Some differences:
///
/// 1. [`Expr::Alias`], which shows only the alias itself
/// 2. [`Expr::Cast`] / [`Expr::TryCast`], which only displays the
expression
///
/// # Example
/// ```
/// # use datafusion_expr::{col, lit};
/// let expr = col("foo").eq(lit(42));
/// assert_eq!("foo = Int32(42)", expr.schema_name().unwrap());
///
/// let expr = col("foo").alias("bar").eq(lit(11));
/// assert_eq!("bar = Int32(11)", expr.schema_name().unwrap());
/// ```
///
/// [`Schema`]: arrow::datatypes::Schema
```
##########
datafusion/expr/src/expr.rs:
##########
@@ -983,10 +983,290 @@ impl PartialOrd for Expr {
}
impl Expr {
- /// Returns the name of this expression as it should appear in a schema.
This name
- /// will not include any CAST expressions.
- pub fn display_name(&self) -> Result<String> {
- create_name(self)
+ /// Returns the name for schema / field that is different from Display
+ /// Most of the expressions are the same as Display
+ /// Those are the main difference
+ /// 1. Alias, where excludes expression
+ /// 2. Cast / TryCast, where takes expression only
+ pub fn schema_name(&self) -> Result<String> {
Review Comment:
If we are going to introduce a new API anyways, what do you think about
making an API that doesn't *require* allocating a new string and making it
infallable. Something like
```suggestion
pub fn schema_name<'a>(&'a self) -> impl Display + 'a {
```
We could probably then implement a new type thing and mostly use the same
code:
```rust
// new type wrapper
struct SchemaDisplay<'a>(&'a Expr);
impl <'a> Display for SchemaDisplay<'a> {
fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
match self {
// The same as Display
Expr::Column(_)
| Expr::Literal(_)
| Expr::ScalarVariable(..)
| Expr::Sort(_)
| Expr::OuterReferenceColumn(..)
| Expr::Placeholder(_)
| Expr::Wildcard { .. } => write!(f, "{self}")?,
...
}
}
return SchemaDisplay(self)
```
The benefit of doing this would be anywhere it needed to get formatted would
not require a copy
```rust
println!("This would not allocated a String: {}", expr.schema_name());
```
```rust
// get schema name as a string:
let schema_name = expr.schema_name().to_string();
##########
datafusion/substrait/tests/cases/consumer_integration.rs:
##########
@@ -358,8 +358,8 @@ mod tests {
let plan = from_substrait_plan(&ctx, &proto).await?;
let plan_str = format!("{:?}", plan);
- assert_eq!(plan_str, "Projection: Decimal128(Some(10000),5,2) *
sum(CASE WHEN FILENAME_PLACEHOLDER_1.p_type LIKE CAST(Utf8(\"PROMO%\") AS Utf8)
THEN FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) -
FILENAME_PLACEHOLDER_0.l_discount ELSE Decimal128(Some(0),19,0) END) /
sum(FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) -
FILENAME_PLACEHOLDER_0.l_discount) AS PROMO_REVENUE\
- \n Aggregate: groupBy=[[]], aggr=[[sum(CASE WHEN
FILENAME_PLACEHOLDER_1.p_type LIKE CAST(Utf8(\"PROMO%\") AS Utf8) THEN
FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) -
FILENAME_PLACEHOLDER_0.l_discount ELSE Decimal128(Some(0),19,0) END),
sum(FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) -
FILENAME_PLACEHOLDER_0.l_discount)]]\
+ assert_eq!(plan_str, "Projection: Decimal128(Some(10000),5,2) *
sum(CASE WHEN FILENAME_PLACEHOLDER_1.p_type LIKE Utf8(\"PROMO%\") THEN
FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) -
FILENAME_PLACEHOLDER_0.l_discount ELSE Decimal128(Some(0),19,0) END) /
sum(FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) -
FILENAME_PLACEHOLDER_0.l_discount) AS PROMO_REVENUE\
Review Comment:
I think it makes sense as long as the cast is still in plans (which it is)
##########
datafusion/expr/src/expr.rs:
##########
@@ -983,10 +983,290 @@ impl PartialOrd for Expr {
}
impl Expr {
- /// Returns the name of this expression as it should appear in a schema.
This name
Review Comment:
Since this is a pub function what do you think about leaving it in marked as
deprecated to help migration?
I think this make the PR much easier to handle
Something like
```rust
#[deprecated(since = "31.0.0", note = "use data_type instead")]
pub fn display_name(&self) -> Result<String> {
self.schema_name()
}
```
##########
datafusion/sqllogictest/test_files/array.slt:
##########
@@ -1137,7 +1137,7 @@ from arrays_values_without_nulls;
## array_element (aliases: array_extract, list_extract, list_element)
# Testing with empty arguments should result in an error
-query error DataFusion error: Error during planning: Error during planning:
array_element does not support zero arguments.
+query error DataFusion error: Execution error: expect 2 args, got 0
Review Comment:
the old error message was nicer -- maybe we can update the code that
generates this message to be nicer too 🤔
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]