paleolimbot commented on code in PR #18120:
URL: https://github.com/apache/datafusion/pull/18120#discussion_r2466593754
##########
datafusion/expr/src/expr.rs:
##########
@@ -3283,11 +3280,11 @@ impl Display for Expr {
}
write!(f, "END")
}
- Expr::Cast(Cast { expr, data_type }) => {
- write!(f, "CAST({expr} AS {data_type})")
+ Expr::Cast(Cast { expr, field }) => {
+ write!(f, "CAST({expr} AS {})", field.data_type())
Review Comment:
There's now a `format_type_and_metadata()` helper in datafusion_common that
will ensure that casts with metadata are displayed. That display is a bit
verbose at the moment when metadata exists but it makes it quite a bit easier
to track down bugs where the metadata was dropped.
##########
datafusion/expr/src/expr.rs:
##########
@@ -3673,7 +3670,7 @@ mod test {
fn format_cast() -> Result<()> {
let expr = Expr::Cast(Cast {
expr: Box::new(Expr::Literal(ScalarValue::Float32(Some(1.23)),
None)),
- data_type: DataType::Utf8,
+ field: Arc::new(Field::new("cast", DataType::Utf8, false)),
Review Comment:
```suggestion
field: DataType::Utf8.into_nullable_field_ref(),
```
@alamb kindly added a helper for this (there are many places in this PR that
would benefit!)
An aside is that I think we should ensure that these fields are always
unnamed for consistency (and that the names of such a field are never used).
##########
datafusion/proto/proto/datafusion.proto:
##########
Review Comment:
I think many of these changes should be reverted (I don't think this file is
generated?)
##########
datafusion/proto/proto/datafusion.proto:
##########
@@ -592,12 +540,12 @@ message WhenThen {
message CastNode {
LogicalExprNode expr = 1;
- datafusion_common.ArrowType arrow_type = 2;
+ datafusion_common.Field field = 2;
Review Comment:
It is also possible to add metadata + nullability here to keep the protobuf
backward compatible (not a requirement but is relatively easy to do)
##########
datafusion/expr/src/expr_fn.rs:
##########
@@ -326,14 +326,75 @@ pub fn rollup(exprs: Vec<Expr>) -> Expr {
Expr::GroupingSet(GroupingSet::Rollup(exprs))
}
+/// Types that can be used to describe the result of a cast expression.
+pub trait IntoCastField {
+ fn into_cast_field(self, expr: &Expr) -> FieldRef;
+}
+
+impl IntoCastField for FieldRef {
+ fn into_cast_field(self, _expr: &Expr) -> FieldRef {
+ self
+ }
+}
+
+impl IntoCastField for &FieldRef {
+ fn into_cast_field(self, _expr: &Expr) -> FieldRef {
+ Arc::clone(self)
+ }
+}
+
+impl IntoCastField for Field {
+ fn into_cast_field(self, _expr: &Expr) -> FieldRef {
+ Arc::new(self)
+ }
+}
+
+impl IntoCastField for &Field {
+ fn into_cast_field(self, _expr: &Expr) -> FieldRef {
+ Arc::new(self.clone())
+ }
+}
+
+impl IntoCastField for DataType {
+ fn into_cast_field(self, expr: &Expr) -> FieldRef {
+ let nullable = infer_cast_nullability(expr);
+ Arc::new(Field::new("", self, nullable))
+ }
+}
+
+fn infer_cast_nullability(expr: &Expr) -> bool {
+ match expr {
+ Expr::Literal(value, _) => value.is_null(),
+ Expr::Cast(Cast { field, .. }) | Expr::TryCast(TryCast { field, .. })
=> {
+ field.is_nullable()
+ }
+ _ => true,
+ }
+}
+
/// Create a cast expression
-pub fn cast(expr: Expr, data_type: DataType) -> Expr {
- Expr::Cast(Cast::new(Box::new(expr), data_type))
+pub fn cast<F>(expr: Expr, field: F) -> Expr
+where
+ F: IntoCastField,
+{
+ let field = field.into_cast_field(&expr);
+ Expr::Cast(Cast::new(Box::new(expr), field))
Review Comment:
I see that a number of references to `cast()` were updated as
`cast(Arc::new(Field::new(...)))`...I wonder if those can be reverted (in
theory `IntoCastField` should handle the conversion?)
##########
datafusion/expr/src/expr_fn.rs:
##########
@@ -326,14 +326,75 @@ pub fn rollup(exprs: Vec<Expr>) -> Expr {
Expr::GroupingSet(GroupingSet::Rollup(exprs))
}
+/// Types that can be used to describe the result of a cast expression.
+pub trait IntoCastField {
+ fn into_cast_field(self, expr: &Expr) -> FieldRef;
+}
Review Comment:
These are somewhat similar to the `FieldExt` traits that Andrew added to
datafusion_common/src/datatypes.rs. I think DataFusion would benefit if it had
a way to say "get me a FieldRef that is actually a data type", but I am also
not sure this PR is the right place to do that.
##########
datafusion/expr/src/expr_schema.rs:
##########
@@ -579,10 +580,13 @@ impl ExprSchemable for Expr {
func.return_field_from_args(args)
}
// _ => Ok((self.get_type(schema)?, self.nullable(schema)?)),
- Expr::Cast(Cast { expr, data_type }) => expr
- .to_field(schema)
- .map(|(_, f)|
f.as_ref().clone().with_data_type(data_type.clone()))
- .map(Arc::new),
+ Expr::Cast(Cast { expr, field }) | Expr::TryCast(TryCast { expr,
field }) => {
+ let (_, input_field) = expr.to_field(schema)?;
+ let mut combined_metadata =
FieldMetadata::from(input_field.metadata());
+
combined_metadata.extend(FieldMetadata::from(field.metadata()));
+ let field =
combined_metadata.add_to_field(field.as_ref().clone());
+ Ok(Arc::new(field))
Review Comment:
I would prefer not to combine metadata here...this would mean that a `Cast`
for an extension type would basically be a "reinterpret cast" not a "logical
cast", which is what I believe the `Cast` is trying to represent in a logical
plan. The `Alias` can currently add metadata in this way which I think is a bit
fishy but is at least not attempting to represent a logical operation.
##########
datafusion/expr/src/expr.rs:
##########
@@ -794,14 +794,14 @@ pub enum GetFieldAccess {
pub struct Cast {
/// The expression being cast
pub expr: Box<Expr>,
- /// The `DataType` the expression will yield
- pub data_type: DataType,
+ /// Field describing the result of the cast, including metadata
+ pub field: FieldRef,
}
impl Cast {
/// Create a new Cast expression
- pub fn new(expr: Box<Expr>, data_type: DataType) -> Self {
- Self { expr, data_type }
+ pub fn new(expr: Box<Expr>, field: FieldRef) -> Self {
+ Self { expr, field }
Review Comment:
This change (and the corresponding change for `TryCast::new()` seem to have
had caused significant disruption...it is probably worth keeping the original
signature and adding a new function specifically for casts with metadata and/or
nullability to minimize the disruption in DataFusion and downstream.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]