jayzhan211 commented on code in PR #12853: URL: https://github.com/apache/datafusion/pull/12853#discussion_r1816904255
########## datafusion/common/src/types/native.rs: ########## @@ -188,6 +190,147 @@ impl LogicalType for NativeType { fn signature(&self) -> TypeSignature<'_> { TypeSignature::Native(self) } + + fn default_cast_for(&self, origin: &DataType) -> Result<DataType> { + use DataType::*; + + fn default_field_cast(to: &LogicalField, from: &Field) -> Result<FieldRef> { + Ok(Arc::new(Field::new( + to.name.clone(), + to.logical_type.default_cast_for(from.data_type())?, + to.nullable, + ))) + } + + Ok(match (self, origin) { + (Self::Null, _) => Null, + (Self::Boolean, _) => Boolean, + (Self::Int8, _) => Int8, + (Self::Int16, _) => Int16, + (Self::Int32, _) => Int32, + (Self::Int64, _) => Int64, + (Self::UInt8, _) => UInt8, + (Self::UInt16, _) => UInt16, + (Self::UInt32, _) => UInt32, + (Self::UInt64, _) => UInt64, + (Self::Float16, _) => Float16, + (Self::Float32, _) => Float32, + (Self::Float64, _) => Float64, + (Self::Decimal(p, s), _) if p <= &38 => Decimal128(*p, *s), + (Self::Decimal(p, s), _) => Decimal256(*p, *s), + (Self::Timestamp(tu, tz), _) => Timestamp(*tu, tz.clone()), + (Self::Date, _) => Date32, + (Self::Time(tu), _) => match tu { + TimeUnit::Second | TimeUnit::Millisecond => Time32(*tu), + TimeUnit::Microsecond | TimeUnit::Nanosecond => Time64(*tu), + }, + (Self::Duration(tu), _) => Duration(*tu), + (Self::Interval(iu), _) => Interval(*iu), + (Self::Binary, LargeUtf8) => LargeBinary, + (Self::Binary, Utf8View) => BinaryView, + (Self::Binary, _) => Binary, + (Self::FixedSizeBinary(size), _) => FixedSizeBinary(*size), + (Self::Utf8, LargeBinary) => LargeUtf8, + (Self::Utf8, BinaryView) => Utf8View, + (Self::Utf8, _) => Utf8, Review Comment: Some initial thoughts First, I think utf8view is the default type for string. In my proposed way, I think the mapping should be based on `arrow::can_cast_to` so if the 1st target type is not supported we go to the next one. But in this design, each type has a single target type to cast to. Is the default type always castable by arrow's `can_cast_to`? Maybe we can just add the check inside. Like ```rust (Self::Utf8, data_type) if can_cast_type(data_type, Utf8View)=> Utf8View, (Self::Utf8, data_type) if can_cast_type(data_type, LargeUtf8)=> LargeUtf8, (Self::Utf8, data_type) if can_cast_type(data_type, Utf8)=> Utf8, ``` Another issue is whether the default type is great for every use case, for example, I think utf8view should be the default type but is there any usecase that require utf8 instead? If yes, how can they modify it? It seems they need to copy the whole `default_cast_for` and change it for their logical type. (We can assume no such case for now, but just to point out that it may not work if there is such a case) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org