jayzhan211 commented on issue #11513:
URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2380359012

   @findepi 
   
   I think the main idea of the issue is to find the mapping between the 
arrow's DataType and the datafusion's logical type.
   In fact we could forgot about logical type and process them all in arrow's 
DataType but to simplify to enumerate all the semantic equivalent type we need 
a more **simplified** type from arrow's DataType and handle the case whenever 
we just need to simplify version. In this case, the LogicalType, which is the 
**simplfied** version should be less than the arrow's DataType. Therefore, we 
can have a one direction of mapping from arrow's DataType and 
UserDefined/Extension Type. The LogicalType (Datafusion native Type) is our 
single of truth in Datafusion. It has the similar role like rust native type. 
What we need is two kinds of trait for type mapping. One for UserDefined type, 
another for arrow's DataType. If their mapped type is the same, it indicates 
that we can decode the value as the expected type, otherwise, it is a type 
mismatch.
   
   ```rust
   #[derive(Clone)]
   pub enum LogicalType {
       Int32,
       String,
       Float32,
       Float64,
       FixedSizeList(Box<LogicalType>, usize),
       // and more
       Extenstion(Arc<dyn ExtensionType>),
   }
   
   pub trait ExtensionType {
       fn logical_type(&self) -> LogicalType;
   }
   
   pub struct JsonType {}
   
   impl ExtensionType for JsonType {
       fn logical_type(&self) -> LogicalType {
           LogicalType::String
       }
   }
   
   pub struct GeoType {
       n_dim: usize
   }
   
   impl ExtensionType for GeoType {
       fn logical_type(&self) -> LogicalType {
           LogicalType::FixedSizeList(Box::new(LogicalType::Float64), 
self.n_dim)
       }
   }
   
   pub trait PhysicalType {
       fn logical_type(&self) -> LogicalType;
   }
   
   impl PhysicalType for DataType {
       fn logical_type(&self) -> LogicalType {
           match self {
               DataType::Int32 => LogicalType::Int32,
               DataType::FixedSizeList(f, n) => {
                   
LogicalType::FixedSizeList(Box::new(f.data_type().logical_type()), *n as usize)
               }
               _ => todo!("")
           }
       }
   }
   ```
   
   Love to hear the feedback about whether this makes sense or what may fail in 
my assumption of type mapping


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to