alamb commented on code in PR #11797: URL: https://github.com/apache/datafusion/pull/11797#discussion_r1706039120
########## datafusion/expr/src/udf.rs: ########## @@ -345,12 +351,22 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { fn name(&self) -> &str; /// Returns the user-defined display name of the UDF given the arguments - /// fn display_name(&self, args: &[Expr]) -> Result<String> { - let names: Vec<String> = args.iter().map(create_name).collect::<Result<_>>()?; + let names: Vec<String> = args.iter().map(ToString::to_string).collect(); + // TODO: join with ", " to standardize the formatting of Vec<Expr>, <https://github.com/apache/datafusion/issues/10364> Ok(format!("{}({})", self.name(), names.join(","))) } + /// Returns the user-defined schema name of the UDF given the arguments Review Comment: ```suggestion /// Returns the name of the column this expression would create /// /// See [`Expr::schema_name`] for details ``` Maybe it is also worth doing the `impl Display` thing here too which would allow avoiding these string allocations ########## datafusion/expr/src/expr.rs: ########## @@ -983,10 +983,290 @@ impl PartialOrd for Expr { } impl Expr { - /// Returns the name of this expression as it should appear in a schema. This name - /// will not include any CAST expressions. - pub fn display_name(&self) -> Result<String> { - create_name(self) + /// Returns the name for schema / field that is different from Display + /// Most of the expressions are the same as Display + /// Those are the main difference + /// 1. Alias, where excludes expression + /// 2. Cast / TryCast, where takes expression only Review Comment: Here is a suggesiton for how to improve this documentation: ```suggestion /// The name of the column (field) that this `Expr` will produce. /// /// For example, for a projection (e.g. `SELECT <expr>`) the resulting arrow /// [`Schema`] will have a field with this name. /// /// Note that the resulting string is subtlety different than the `Display` /// representation for certain `Expr`. Some differences: /// /// 1. [`Expr::Alias`], which shows only the alias itself /// 2. [`Expr::Cast`] / [`Expr::TryCast`], which only displays the expression /// /// # Example /// ``` /// # use datafusion_expr::{col, lit}; /// let expr = col("foo").eq(lit(42)); /// assert_eq!("foo = Int32(42)", expr.schema_name().unwrap()); /// /// let expr = col("foo").alias("bar").eq(lit(11)); /// assert_eq!("bar = Int32(11)", expr.schema_name().unwrap()); /// ``` /// /// [`Schema`]: arrow::datatypes::Schema ``` ########## datafusion/expr/src/expr.rs: ########## @@ -983,10 +983,290 @@ impl PartialOrd for Expr { } impl Expr { - /// Returns the name of this expression as it should appear in a schema. This name - /// will not include any CAST expressions. - pub fn display_name(&self) -> Result<String> { - create_name(self) + /// Returns the name for schema / field that is different from Display + /// Most of the expressions are the same as Display + /// Those are the main difference + /// 1. Alias, where excludes expression + /// 2. Cast / TryCast, where takes expression only + pub fn schema_name(&self) -> Result<String> { Review Comment: If we are going to introduce a new API anyways, what do you think about making an API that doesn't *require* allocating a new string and making it infallable. Something like ```suggestion pub fn schema_name<'a>(&'a self) -> impl Display + 'a { ``` We could probably then implement a new type thing and mostly use the same code: ```rust // new type wrapper struct SchemaDisplay<'a>(&'a Expr); impl <'a> Display for SchemaDisplay<'a> { fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result { match self { // The same as Display Expr::Column(_) | Expr::Literal(_) | Expr::ScalarVariable(..) | Expr::Sort(_) | Expr::OuterReferenceColumn(..) | Expr::Placeholder(_) | Expr::Wildcard { .. } => write!(f, "{self}")?, ... } } return SchemaDisplay(self) ``` The benefit of doing this would be anywhere it needed to get formatted would not require a copy ```rust println!("This would not allocated a String: {}", expr.schema_name()); ``` ```rust // get schema name as a string: let schema_name = expr.schema_name().to_string(); ########## datafusion/substrait/tests/cases/consumer_integration.rs: ########## @@ -358,8 +358,8 @@ mod tests { let plan = from_substrait_plan(&ctx, &proto).await?; let plan_str = format!("{:?}", plan); - assert_eq!(plan_str, "Projection: Decimal128(Some(10000),5,2) * sum(CASE WHEN FILENAME_PLACEHOLDER_1.p_type LIKE CAST(Utf8(\"PROMO%\") AS Utf8) THEN FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) - FILENAME_PLACEHOLDER_0.l_discount ELSE Decimal128(Some(0),19,0) END) / sum(FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) - FILENAME_PLACEHOLDER_0.l_discount) AS PROMO_REVENUE\ - \n Aggregate: groupBy=[[]], aggr=[[sum(CASE WHEN FILENAME_PLACEHOLDER_1.p_type LIKE CAST(Utf8(\"PROMO%\") AS Utf8) THEN FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) - FILENAME_PLACEHOLDER_0.l_discount ELSE Decimal128(Some(0),19,0) END), sum(FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) - FILENAME_PLACEHOLDER_0.l_discount)]]\ + assert_eq!(plan_str, "Projection: Decimal128(Some(10000),5,2) * sum(CASE WHEN FILENAME_PLACEHOLDER_1.p_type LIKE Utf8(\"PROMO%\") THEN FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) - FILENAME_PLACEHOLDER_0.l_discount ELSE Decimal128(Some(0),19,0) END) / sum(FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) - FILENAME_PLACEHOLDER_0.l_discount) AS PROMO_REVENUE\ Review Comment: I think it makes sense as long as the cast is still in plans (which it is) ########## datafusion/expr/src/expr.rs: ########## @@ -983,10 +983,290 @@ impl PartialOrd for Expr { } impl Expr { - /// Returns the name of this expression as it should appear in a schema. This name Review Comment: Since this is a pub function what do you think about leaving it in marked as deprecated to help migration? I think this make the PR much easier to handle Something like ```rust #[deprecated(since = "31.0.0", note = "use data_type instead")] pub fn display_name(&self) -> Result<String> { self.schema_name() } ``` ########## datafusion/sqllogictest/test_files/array.slt: ########## @@ -1137,7 +1137,7 @@ from arrays_values_without_nulls; ## array_element (aliases: array_extract, list_extract, list_element) # Testing with empty arguments should result in an error -query error DataFusion error: Error during planning: Error during planning: array_element does not support zero arguments. +query error DataFusion error: Execution error: expect 2 args, got 0 Review Comment: the old error message was nicer -- maybe we can update the code that generates this message to be nicer too 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org