alamb commented on code in PR #18552:
URL: https://github.com/apache/datafusion/pull/18552#discussion_r2691287946
##########
datafusion/expr/src/registry.rs:
##########
@@ -215,3 +217,232 @@ impl FunctionRegistry for MemoryFunctionRegistry {
self.udwfs.keys().cloned().collect()
}
}
+
+/// The registration of an extension type.
+///
+/// Implementations of this trait are responsible for *creating* instances of
[LogicalType] that
+/// represent the semantics of an extension type. One cannot directly register
the [LogicalType]
+/// instances because some extension types may have parameters that are
unknown at compile time
+/// (e.g., the unknown type in [Opaque](arrow_schema::extension::Opaque)).
+pub trait ExtensionTypeRegistration: Debug + Send + Sync {
Review Comment:
I don't fully follow the need for this new trait or the use of Opaque. If
an extension type is unknown (aka Opaque)
I guess I would expect that if a logical type isn't registered, that
DataFusion would basically treat those types with the normal handling for the
physical type
##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -694,13 +694,7 @@ impl<'a> ConstEvaluator<'a> {
let metadata = phys_expr
.return_field(self.input_batch.schema_ref())
.ok()
- .and_then(|f| {
- let m = f.metadata();
- match m.is_empty() {
- true => None,
- false => Some(FieldMetadata::from(m)),
- }
- });
+ .map(|f| FieldMetadata::from(f.as_ref()));
Review Comment:
❤️
That is certainly nicer
##########
datafusion/core/tests/extension_types/pretty_printing.rs:
##########
@@ -0,0 +1,81 @@
+use arrow::array::{FixedSizeBinaryArray, RecordBatch};
Review Comment:
Thank you for this end to end test. It is really helpful to see the new code
in action
##########
datafusion/core/tests/extension_types/pretty_printing.rs:
##########
@@ -0,0 +1,81 @@
+use arrow::array::{FixedSizeBinaryArray, RecordBatch};
+use arrow_schema::extension::Uuid;
+use arrow_schema::{DataType, Field, Schema, SchemaRef};
+use datafusion::dataframe::DataFrame;
+use datafusion::error::Result;
+use datafusion::prelude::SessionContext;
+use datafusion_common::metadata::FieldMetadata;
+use datafusion_common::ScalarValue;
+use datafusion_expr::{col, lit_with_metadata};
+use insta::assert_snapshot;
+use std::sync::Arc;
+
+fn test_schema() -> SchemaRef {
+ Arc::new(Schema::new(vec![uuid_field()]))
+}
+
+fn uuid_field() -> Field {
+ Field::new("my_uuids", DataType::FixedSizeBinary(16),
false).with_extension_type(Uuid)
+}
+
+async fn create_test_table() -> Result<DataFrame> {
+ let schema = test_schema();
+
+ // define data.
+ let batch = RecordBatch::try_new(
+ schema,
+ vec![Arc::new(FixedSizeBinaryArray::from(vec![
+ &[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+ &[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 5, 6],
+ ]))],
+ )?;
+
+ let ctx = SessionContext::new();
Review Comment:
I would have expected that the example would need to register the `Uuid`
extension type in order to use it. Something like
```rust
ctx.register_extension_type(Uuid::new())
```
It looks like maybe it is registered here:
https://github.com/apache/datafusion/blob/3db5082869d9d38d919bf492f462a5ff7b9798ad/datafusion/core/src/execution/session_state_defaults.rs#L131-L134
```rust
/// returns the list of default [`WindowUDF`]s
pub fn default_extension_types() -> Vec<ExtensionTypeRegistrationRef> {
vec![SimpleExtensionTypeRegistration::new_arc(
Uuid::NAME,
Arc::new(Uuid),
)]
}
```
I would personally suggest starting with extension types that must be
manually registered, and then we can move on to deciding which ones should be
included by default.
Doing this would also serve as a nice example to help others implementing
them
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]