jayzhan211 commented on issue #11513: URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2247822197
> > But we actually have List(Uint8) inside arrow::Schema, so we can decode the type with List(UInt8). Therefore, we have UserDefinedType(Utf8) in the logical planning step, and List(Uint8) in the physical planning step > > Makes sense and I like this approach. The only thing I'm still not understanding is how a type source (like a TableProvider) would communicate to the logical layer that the `List(UInt8)` is actually a logical `Utf8`. > > > The most importance thing is that we could keep DFSchema mostly the same as it is now. But we introduce a bunch of new LogicalType and Trait for UserDefinedType to built around it. > > This would be tremendously beneficial to this proposal :) I think we could register the type mapping to the session state like what we register function. We define the relation from Arrow DataType to LogicalType only. We then could lookup the type relation we have and figure out the `List(UInt8)`'s logical type is `Utf8`. ```rust pub enum LogicalType { Utf8, Float, FixedSizeList(Box<LogicalType>, usize), } pub trait TypeRelation { pub fn as_any(&self) -> &dyn Any { self } pub fn get_logical_type(&self, data_type: &DataType) -> Result<LogicalType> { _not_impl_err!("nyi") } } pub struct ListOfU8AsStringType { } impl TypeRelation for ListOfU8AsStringType { fn get_logical_type(&self, data_type: &DataType) -> Result<LogicalType> { match data_type { DataType::List(field) if field.data_type() == &DataType::Utf8 => { Ok(LogicalType::Utf8) } _ => _not_impl_err!("nyi") } } } pub struct GeoType { } impl TypeRelation for GeoType { fn get_logical_type(&self, data_type: &DataType) -> Result<LogicalType> { match data_type { DataType::FixedSizeList(field, 2) if field.data_type() == &DataType::Float32 => { Ok(LogicalType::FixedSizeList(Box::new(LogicalType::Float), 2)) } _ => _not_impl_err!("nyi") } } } pub struct DatafusionBuiltinType { } impl TypeRelation for DatafusionBuiltinType { fn get_logical_type(&self, data_type: &DataType) -> Result<LogicalType> { match data_type { DataType::Utf8View => { Ok(LogicalType::Utf8) } _ => _not_impl_err!("nyi") } } } // Ideally function in logical layer care about LogicalType only, function in physical layer care about ArrowDataType only. fn any_function(udt: Arc<dyn TypeRelation>, data_type: DataType) -> Result<()> { let logical_type = udt.get_logical_type(&data_type)?; match data_type { LogicalType::FixedSizeList(inner_type, size) => { Ok(()) } _ => todo!() } } impl SessionState { fn register_type_relation(&self) -> Result<()> { // Similar to the idea of register function, so we can have user-defined type relation (mapping). self.register(DatafusionBuiltinType) self.register(ListOfU8AsStringType) self.register(GeoType) Ok(()) } } impl TableProvider for MyTable { fn some_func(&self, state: CatalogSessions) { let udt: Arc<dyn TypeRelation> = self.get_type_relation(); let dfschema = DFSchema::empty(); let expr: Expr; let data_type: arrow::DataType = expr.get_type(&dfschema); any_funcion(udf, data_type) } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org