paleolimbot commented on PR #17986:
URL: https://github.com/apache/datafusion/pull/17986#issuecomment-3402811404

   Thank you for the review! I will clean this up today/tomorrow and ensure we 
can test all the spots.
   
   > I now have a bit more time on my hand and I'll try to revive that effort 
as we would be still interested in that.
   
   I'd love that! Perhaps the first step from the non-sorting angle might be to 
just consolidate all the places we print, cast, or test equality (which are 
currently scattered around the code base).
   
   A vague thought based mostly on this PR, SedonaDB's `SedonaType`, and the 
one linked (likely needs much refinement):
   
   ```rust
   pub fn resolve_df_data_type(session: Option<SessionSomething>, storage_type: 
&DataType, metadata: impl MetadataViewOfSomeSort ) -> Box<dyn DFDataType> {
     // sanity check to avoid hitting the lock on any registry whatsoever when 
not needed)
   
     // If the session is specified, use a session registry
   
     // otherwise, use a static or compile-time populated registry until such a 
time as the session
     // is made available everywhere and/or DataType::Extension is a thing
   }
   
   pub trait DFDataType: Clone {
     fn storage_type(&self) -> DataType;
     // e.g., the sanity check on the return field of a UDxF
     fn type_identical_to(&self, &dyn DFDataType) -> Result<bool>;
     // for when this has to be a storage field (it might already be one)
     fn storage_field(&self, name: Option<&str>, nullability: Option<bool>) -> 
FieldRef;
     fn add_to_existing_storage_field(&self
     // for when this has to be just metadata (e.g., the Expr::Literal)
     fn storage_field_metadata(&self) -> Option<FieldMetadata>;
     // to avoid dumping a Field debug string in user-facing errors
     fn display_type(&self) -> String;
     // for any hope of ever being able to deal with shredded/unshredded 
variants in UNION ALL
     // slash listing tables. This is useful for some geoarrow types too 
(wkb/wkb_view -> wkb)
     fn common_castable_type(&self, &dyn DFDataType) -> Result<Box<dyn 
DFDataType>> { /* self if types are identical or error */ }
     // usually printing the storage is OK but no ideal (e.g., variants should 
be jsonish)
     fn format_proxy(&self, storage: ArrayRef) -> Result<ArrayRef> { 
Ok(storage) }
     // Or something more advanced like the dynamic comparator in the linked PR
     fn sort_proxy(&self, storage: ArrayRef) -> Result<ArrayRef> { Ok(storage) }
     // It's a shame to bleed execution in here but casting shows up a lot
     fn cast_storage(&self, storage: ArrayRef, to: Box<dyn DFDataType>);
   }
   
   // All the structures that are currently communicating that type information
   
   impl DFDataType for FieldMetadata {
     // implement strict equality based on all metadata (or just extension 
name/extension metadata?)
     // printing based on metadata
     // everything else based on datatype
     // ...basically everything datafusion does right now
   }
   
   impl DFDataType for FieldRef {
     // defer to fieldmetadata
   }
   
   impl DFDataType for DataType {
    // defer to DataTypes
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to