scovich commented on code in PR #9677:
URL: https://github.com/apache/arrow-rs/pull/9677#discussion_r3058464010
##########
arrow-schema/src/field.rs:
##########
@@ -504,13 +504,39 @@ impl Field {
.map(String::as_ref)
}
+ /// Returns `true` if this [`Field`] has the given [`ExtensionType`] name
+ /// and can be successfully validated as that extension type.
+ ///
+ /// This first checks the extension type name and only calls
+ /// [`ExtensionType::validate`] when the name matches.
+ ///
+ /// This is useful when you only need a boolean validity check and do not
+ /// need to retrieve the extension type instance.
+ #[inline]
+ pub fn has_valid_extension_type<E: ExtensionType>(&self) -> bool {
+ if self.extension_type_name() != Some(E::NAME) {
+ return false;
+ }
+
+ let ext_metadata = self
+ .metadata()
+ .get(EXTENSION_TYPE_METADATA_KEY)
+ .map(|s| s.as_str());
Review Comment:
nit?
```suggestion
.as_deref();
```
##########
arrow-schema/src/extension/canonical/bool8.rs:
##########
@@ -68,6 +68,10 @@ impl ExtensionType for Bool8 {
fn try_new(data_type: &DataType, _metadata: Self::Metadata) ->
Result<Self, ArrowError> {
Self.supports_data_type(data_type).map(|_| Self)
}
+
+ fn validate(data_type: &DataType, _metadata: Self::Metadata) -> Result<(),
ArrowError> {
+ Self.supports_data_type(data_type)
Review Comment:
Ah... `Self` is an empty struct, so it has no constructor (singleton).
##########
arrow-schema/src/extension/mod.rs:
##########
@@ -257,6 +257,15 @@ pub trait ExtensionType: Sized {
/// this extension type.
fn try_new(data_type: &DataType, metadata: Self::Metadata) -> Result<Self,
ArrowError>;
+ /// Validate this extension type for a field with the given data type and
+ /// metadata.
+ ///
+ /// The default implementation delegates to [`Self::try_new`]. Extension
+ /// types may override this to validate without constructing `Self`.
+ fn validate(data_type: &DataType, metadata: Self::Metadata) -> Result<(),
ArrowError> {
+ Self::try_new(data_type, metadata).map(|_| ())
Review Comment:
I fear we have an API clash here:
* `supports_data_type` receives `&self`, in order to allow configurable
extension types based on their metadata
* `validate` receives metadata but can only use it by instantiating Self
(which requires the very allocation we wanted to avoid).
Ideally, `supports_data_type` should be implemented in terms of `validate`
instead... but I guess that would be a breaking change?
The next best would be to chase down every extension type that actually has
metadata, and implement the two methods in the correct direction.
Does any impl in the arrow crate actually use this provided method? Or is it
just a safety net for third party impl to avoid a breaking change?
##########
arrow-schema/src/extension/canonical/bool8.rs:
##########
@@ -68,6 +68,10 @@ impl ExtensionType for Bool8 {
fn try_new(data_type: &DataType, _metadata: Self::Metadata) ->
Result<Self, ArrowError> {
Self.supports_data_type(data_type).map(|_| Self)
}
+
+ fn validate(data_type: &DataType, _metadata: Self::Metadata) -> Result<(),
ArrowError> {
+ Self.supports_data_type(data_type)
Review Comment:
I didn't know this was valid syntax... but if it is, why do we need
`Self::default().supports_data_type` below?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]