paleolimbot commented on PR #17986:
URL: https://github.com/apache/datafusion/pull/17986#issuecomment-3402811404
Thank you for the review! I will clean this up today/tomorrow and ensure we
can test all the spots.
> I now have a bit more time on my hand and I'll try to revive that effort
as we would be still interested in that.
I'd love that! Perhaps the first step from the non-sorting angle might be to
just consolidate all the places we print, cast, or test equality (which are
currently scattered around the code base).
A vague thought based mostly on this PR, SedonaDB's `SedonaType`, and the
one linked (likely needs much refinement):
```rust
pub fn resolve_df_data_type(session: Option<SessionSomething>, storage_type:
&DataType, metadata: impl MetadataViewOfSomeSort ) -> Box<dyn DFDataType> {
// sanity check to avoid hitting the lock on any registry whatsoever when
not needed)
// If the session is specified, use a session registry
// otherwise, use a static or compile-time populated registry until such a
time as the session
// is made available everywhere and/or DataType::Extension is a thing
}
pub trait DFDataType: Clone {
fn storage_type(&self) -> DataType;
// e.g., the sanity check on the return field of a UDxF
fn type_identical_to(&self, &dyn DFDataType) -> Result<bool>;
// for when this has to be a storage field (it might already be one)
fn storage_field(&self, name: Option<&str>, nullability: Option<bool>) ->
FieldRef;
fn add_to_existing_storage_field(&self
// for when this has to be just metadata (e.g., the Expr::Literal)
fn storage_field_metadata(&self) -> Option<FieldMetadata>;
// to avoid dumping a Field debug string in user-facing errors
fn display_type(&self) -> String;
// for any hope of ever being able to deal with shredded/unshredded
variants in UNION ALL
// slash listing tables. This is useful for some geoarrow types too
(wkb/wkb_view -> wkb)
fn common_castable_type(&self, &dyn DFDataType) -> Result<Box<dyn
DFDataType>> { /* self if types are identical or error */ }
// usually printing the storage is OK but no ideal (e.g., variants should
be jsonish)
fn format_proxy(&self, storage: ArrayRef) -> Result<ArrayRef> {
Ok(storage) }
// Or something more advanced like the dynamic comparator in the linked PR
fn sort_proxy(&self, storage: ArrayRef) -> Result<ArrayRef> { Ok(storage) }
// It's a shame to bleed execution in here but casting shows up a lot
fn cast_storage(&self, storage: ArrayRef, to: Box<dyn DFDataType>);
}
// All the structures that are currently communicating that type information
impl DFDataType for FieldMetadata {
// implement strict equality based on all metadata (or just extension
name/extension metadata?)
// printing based on metadata
// everything else based on datatype
// ...basically everything datafusion does right now
}
impl DFDataType for FieldRef {
// defer to fieldmetadata
}
impl DFDataType for DataType {
// defer to DataTypes
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]